My first blog post ever! Be harsh, but, you know, constructive.
Too much efficiency makes everything worse: overfitting and the strong version of Goodhart's law
sohl-dickstein.github.io/2022/11/06/str…
🧵
Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.
After 2 years of work by 442 contributors across 132 institutions, I am thrilled to announce that the github.com/google/BIG-ben… paper is now live: arxiv.org/abs/2206.04615. BIG-bench consists of 204 diverse tasks to measure and extrapolate the capabilities of large language models.
"Finite Versus Infinite Neural Networks: an Empirical Study." arxiv.org/abs/2007.15801 This paper contains everything you ever wanted to know about infinite width networks, but didn't have the computational capacity to ask! Like really a lot of content. Let's dive in.
Title: Advice for a young investigator in the first and last days of the Anthropocene
Abstract: Within just a few years, it is likely that we will create AI systems that outperform the best humans on all intellectual tasks. This will have implications for your research and
The boundary between trainable and untrainable neural network hyperparameter configurations is *fractal*! And beautiful!
Here is a grid search over a different pair of hyperparameters -- this time learning rate and the mean of the parameter initialization distribution.
Modern deep learning is a story of learned features outperforming (then replacing!) hand-designed algorithms. But we still use hand designed loss functions and optimizers. Here is a big step towards learned optimizers outperforming existing optimizers: arxiv.org/abs/2009.11243
If there is one thing the deep learning revolution has taught us, it's that neural nets will outperform hand-designed heuristics, given enough compute and data.
But we still use hand-designed heuristics to train our models. Let's replace our optimizers with trained neural nets!
Eliminating All Bad Local Minima from Loss Landscapes Without Even Adding an Extra Unit arxiv.org/pdf/1901.03909…
It's less than one page. It may be deep. It may be trivial. It will definitely help you understand how some claims in recent theory papers could possibly be true.
Adversarial Reprogramming of Neural Networks goo.gl/qnB5FA A new goal for adversarial attacks! Rather than cause a specific misclassification, we force neural networks to behave as if they were trained on a completely different task! With @gamaleldinfe, @goodfellow_ian
For years I've shown this 2x2 grid in talks on infinite width networks, but with just a big ❓ in the upper-left.
No longer! In arxiv.org/abs/2206.07673 we characterize wide Bayesian neural nets in parameter space. This fills a theory gap, and enables *much* faster MCMC sampling.
Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible: arxiv.org/abs/2008.07545 We examine what information is usable for training neural networks, and how second order methods destroy exactly that information.
The hot mess theory of AI misalignment (+ an experiment!)
sohl-dickstein.github.io/2023/03/09/coh…
There are two ways an AI could be misaligned. It could monomaniacally pursue the wrong goal (supercoherence), or it could act in ways that don't pursue any consistent goal (hot mess/incoherent).