Jascha Sohl-Dickstein (@jaschasd) / X

Jascha Sohl-Dickstein

593 posts

Jascha Sohl-Dickstein

@jaschasd

Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.

San Francisco

Joined August 2009

Pinned
Jascha Sohl-Dickstein
@jaschasd
Nov 7, 2022
My first blog post ever! Be harsh, but, you know, constructive. Too much efficiency makes everything worse: overfitting and the strong version of Goodhart's law sohl-dickstein.github.io/2022/11/06/str… 🧵
Jascha Sohl-Dickstein
@jaschasd
Feb 12, 2024
Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.
00:00
1.8M
Jascha Sohl-Dickstein
@jaschasd
Jun 10, 2022
After 2 years of work by 442 contributors across 132 institutions, I am thrilled to announce that the github.com/google/BIG-ben… paper is now live: arxiv.org/abs/2206.04615. BIG-bench consists of 204 diverse tasks to measure and extrapolate the capabilities of large language models.
Jascha Sohl-Dickstein
@jaschasd
Aug 8, 2020
"Finite Versus Infinite Neural Networks: an Empirical Study." arxiv.org/abs/2007.15801 This paper contains everything you ever wanted to know about infinite width networks, but didn't have the computational capacity to ask! Like really a lot of content. Let's dive in.
00:00
Jascha Sohl-Dickstein
@jaschasd
Sep 28, 2025
Title: Advice for a young investigator in the first and last days of the Anthropocene Abstract: Within just a few years, it is likely that we will create AI systems that outperform the best humans on all intellectual tasks. This will have implications for your research and
345K
Jascha Sohl-Dickstein
@jaschasd
Feb 12, 2024
Replying to @jaschasd
The boundary between trainable and untrainable neural network hyperparameter configurations is *fractal*! And beautiful! Here is a grid search over a different pair of hyperparameters -- this time learning rate and the mean of the parameter initialization distribution.
00:00
251K
Jascha Sohl-Dickstein
@jaschasd
Sep 24, 2020
Modern deep learning is a story of learned features outperforming (then replacing!) hand-designed algorithms. But we still use hand designed loss functions and optimizers. Here is a big step towards learned optimizers outperforming existing optimizers: arxiv.org/abs/2009.11243
Jascha Sohl-Dickstein
@jaschasd
Nov 18, 2022
If there is one thing the deep learning revolution has taught us, it's that neural nets will outperform hand-designed heuristics, given enough compute and data. But we still use hand-designed heuristics to train our models. Let's replace our optimizers with trained neural nets!
Jascha Sohl-Dickstein
@jaschasd
Feb 12, 2024
Replying to @jaschasd
Want to learn more? Blog post: sohl-dickstein.github.io/2024/02/12/fra… 3-page paper: arxiv.org/abs/2402.06184
sohl-dickstein.github.io
Neural network training makes beautiful fractals
This blog is intended to be a place to share ideas and results that are too weird, incomplete, or off-topic to turn into an academic paper, but that I think may be important. Let me know what you...
60K
Jascha Sohl-Dickstein
@jaschasd
Jan 15, 2019
Eliminating All Bad Local Minima from Loss Landscapes Without Even Adding an Extra Unit arxiv.org/pdf/1901.03909… It's less than one page. It may be deep. It may be trivial. It will definitely help you understand how some claims in recent theory papers could possibly be true.
Jascha Sohl-Dickstein
@jaschasd
Jul 2, 2018
Adversarial Reprogramming of Neural Networks goo.gl/qnB5FA A new goal for adversarial attacks! Rather than cause a specific misclassification, we force neural networks to behave as if they were trained on a completely different task! With @gamaleldinfe, @goodfellow_ian
Jascha Sohl-Dickstein
@jaschasd
Jun 18, 2022
For years I've shown this 2x2 grid in talks on infinite width networks, but with just a big ❓ in the upper-left. No longer! In arxiv.org/abs/2206.07673 we characterize wide Bayesian neural nets in parameter space. This fills a theory gap, and enables *much* faster MCMC sampling.
Jascha Sohl-Dickstein
@jaschasd
Aug 19, 2020
Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible: arxiv.org/abs/2008.07545 We examine what information is usable for training neural networks, and how second order methods destroy exactly that information.
Jascha Sohl-Dickstein
@jaschasd
Mar 9, 2023
The hot mess theory of AI misalignment (+ an experiment!) sohl-dickstein.github.io/2023/03/09/coh… There are two ways an AI could be misaligned. It could monomaniacally pursue the wrong goal (supercoherence), or it could act in ways that don't pursue any consistent goal (hot mess/incoherent).
299K