user avatar
Jascha Sohl-Dickstein
@jaschasd
Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.
San Francisco
Joined August 2009
Posts
  • Pinned
    user avatar
    My first blog post ever! Be harsh, but, you know, constructive. Too much efficiency makes everything worse: overfitting and the strong version of Goodhart's law sohl-dickstein.github.io/2022/11/06/str… 🧵
    Image
    Image
  • user avatar
    Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.
    Image
    00:00
  • user avatar
    After 2 years of work by 442 contributors across 132 institutions, I am thrilled to announce that the github.com/google/BIG-ben… paper is now live: arxiv.org/abs/2206.04615. BIG-bench consists of 204 diverse tasks to measure and extrapolate the capabilities of large language models.
    Image
  • user avatar
    "Finite Versus Infinite Neural Networks: an Empirical Study." arxiv.org/abs/2007.15801 This paper contains everything you ever wanted to know about infinite width networks, but didn't have the computational capacity to ask! Like really a lot of content. Let's dive in.
    Image
    00:00
  • user avatar
    Title: Advice for a young investigator in the first and last days of the Anthropocene Abstract: Within just a few years, it is likely that we will create AI systems that outperform the best humans on all intellectual tasks. This will have implications for your research and
    Image
  • user avatar
    Replying to @jaschasd
    The boundary between trainable and untrainable neural network hyperparameter configurations is *fractal*! And beautiful! Here is a grid search over a different pair of hyperparameters -- this time learning rate and the mean of the parameter initialization distribution.
    Image
    00:00
  • user avatar
    Modern deep learning is a story of learned features outperforming (then replacing!) hand-designed algorithms. But we still use hand designed loss functions and optimizers. Here is a big step towards learned optimizers outperforming existing optimizers: arxiv.org/abs/2009.11243
    Image
  • user avatar
    If there is one thing the deep learning revolution has taught us, it's that neural nets will outperform hand-designed heuristics, given enough compute and data. But we still use hand-designed heuristics to train our models. Let's replace our optimizers with trained neural nets!
    Image
  • user avatar
  • user avatar
    Eliminating All Bad Local Minima from Loss Landscapes Without Even Adding an Extra Unit arxiv.org/pdf/1901.03909… It's less than one page. It may be deep. It may be trivial. It will definitely help you understand how some claims in recent theory papers could possibly be true.
    Image
  • user avatar
    Adversarial Reprogramming of Neural Networks goo.gl/qnB5FA A new goal for adversarial attacks! Rather than cause a specific misclassification, we force neural networks to behave as if they were trained on a completely different task! With @gamaleldinfe, @goodfellow_ian
    Image
  • user avatar
    For years I've shown this 2x2 grid in talks on infinite width networks, but with just a big ❓ in the upper-left. No longer! In arxiv.org/abs/2206.07673 we characterize wide Bayesian neural nets in parameter space. This fills a theory gap, and enables *much* faster MCMC sampling.
    Image
  • user avatar
    Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible: arxiv.org/abs/2008.07545 We examine what information is usable for training neural networks, and how second order methods destroy exactly that information.
    Image
  • user avatar
    The hot mess theory of AI misalignment (+ an experiment!) sohl-dickstein.github.io/2023/03/09/coh… There are two ways an AI could be misaligned. It could monomaniacally pursue the wrong goal (supercoherence), or it could act in ways that don't pursue any consistent goal (hot mess/incoherent).
    Image