Charlie Snell

I'm a fourth year CS PhD student in Berkeley EECS advised by Dan Klein. Previously, I was a UC Berkeley undergrad, where I had the great opportunity to work with and learn from a number of fantastic AI researchers, such as Sergey Levine, Ruiqi Zhong, Dan Klein, Jacob Steinhardt, and Jason Eisner. I was also previously a Student Researcher at Google DeepMind. I am now working at Cursor on training frontier coding agents.

Email / Google Scholar / Twitter / Github

Research

See Google Scholar for more.

	Sleep-time Compute: Beyond Inference Scaling at Test-time Kevin Lin^, Charlie Snell^, Yu Wang, Charles Packer, Sarah Wooders, Ion Stoica, Joseph E. Gonzalez ArXiv 2025 [paper] We introduce sleep-time compute, which allows models to “think” offline about contexts before queries are presented: by anticipating what queries users might ask and pre-computing useful quantities, we can significantly reduce the compute requirements at test-time.
	Predicting Emergent Capabilities by Finetuning Charlie Snell, Eric Wallace, Dan Klein, Sergey Levine COLM 2024 [paper] Can we predict emergent capabilities in GPT-N+1 🌌 using GPT-N, which has random performance on the task? To do this, we use information about how pre-emergence model checkpoints behave under the influence of task-specific finetuning to obtain predictive power about the point of emergence in the few-shot setting.
	Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar arXiv 2024 [paper] On difficult problems, humans tend to think longer to improve their decisions. Can we instill a similar capability into LLMs? And how well can it perform? We find that by optimally scaling test-time compute we can outperform much larger models in a FLOPs matched evaluation.
	The False Promise of Imitating Proprietary LLMs Arnav Gudibande^, Eric Wallace^, Charlie Snell^, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song ICLR 2024* [paper] Recent systems – like Koala, Vicuna, and Alpaca – finetune a weaker language model to imitate the outputs of a stronger model, like ChatGPT or GPT-4. In this work, we critically analyze the shortcomings of this approach.
	Learning by Distilling Context Charlie Snell, Dan Klein, Ruiqi Zhong arXiv 2022 [paper] [talk] Language models significantly benefit from context tokens, such as prompts or scratchpads. We propose to apply context distillation so that a language model can improve itself by internalizing these gains.
	Offline RL for Natural Language Generation with Implicit Language Q Learning Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine ICLR 2023 [paper] [project page] [code] [talk] We propose an effective and easy-to-use offline RL motivated method for steering language models towards successfully completing language tasks, such as goal directed dialogue, controled generation, and word games.
	Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL Ruiqi Zhong^, Charlie Snell^, Dan Klein, Jason Eisner EMNLP 2023 [paper] We introduce APEL, a new framework that enables non-programmers to indirectly annotate natural language utterances with executable meaning representations, such as SQL programs.
	Context-Aware Language Modeling for Goal-Oriented Dialogue Systems Charlie Snell, Mengjiao Yang, Justin Fu, Yi Su, Sergey Levine NAACL 2022, Findings [paper] [project page] [code] We extend techniques from learning-based control, such as task relabeling, to derive a simple and effective method to finetune language models in a goal-aware way.
	Approximating How Single Head Attention Learns Charlie Snell^, Ruiqi Zhong^, Dan Klein, Jacob Steinhardt arXiv 2021 [paper] [slides] [code] [blog] Why do models often attend to salient words, and how does this evolve throughout training?
	The Omniglot Jr. challenge; Can a model achieve child-level character generation and classification? Eliza Kosoy, Masha Belyi, Charlie Snell, Josh Tenenbaum, Brenden Lake, Alison Gopnik NeurIPS Workshop on BabyMind 2020 [paper] We augment the original Omniglot dataset with a new dataset of children's handwritten characters. We then study the properties of a Bayesian Program Learning model trained on this new data.

Assorted Writing

I've had the pleasure of getting to write several articles for Machine Learning at Berkeley's technical blog.

Alien Dreams: An Emerging Art Scene
June 2021
[blog] [co ver age] [disc ussion]

A tour through the wonderful AI art scene that emerged when CLIP was released in January 2021.

How is it so good ? (DALL-E Explained Pt. 2)
April 2021
[blog]

A technical and philosophical discussion of how DALL-E works, why it is so effective at generating images from a text prompt, and its theoretical limitations.

Understanding VQ-VAE (DALL-E Explained Pt. 1)
February 2021
[blog]

How do vector quantized variational autoencoders (VQ-VAEs) work? And what role do they play in modern generative models, such as DALL-E and Jukebox?

Side Projects / Open Source Implementations

Selected projects. See my github for much more.

(Press "y" to add a random circle, "n" to remove one, and "wasd" to pan.)

JaxSeq October 2022 [code] Built on top of HuggingFace's Transformers library, JaxSeq enables training very large language models in Jax with model and data parallelism across both multi-device and multi-node clusters.
	Re-implementation of the paper "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets" November 2021 [code] Re-create the dramatic train/test curves from the original paper; experiment with the grokking phenomenon yourself.
Music Preference Visualization with Deep Embeddings June-July 2020 [tweet] Harness the power of deep music representations to generate playlists and visualize your music preferences in an interactive web app.
	Train Deep Neural Networks on a 2013 Macbook Air GPU 2017/2018 [code] A deep learning framework implemented from scratch in C++/OpenCL. Implements GPU kernels that can run on a 2013 Macbook Air GPU (and other Apple computers). Implements LSTM training/inference for music lyric generation.
Yeah JeCUB App 2017/2018 [app store] A humorous sound-box app.
	2D Procedural Endless World 2015 [code] Scroll through an infinite 2D block-world consisting of rugged terrain, endless caves, fluffy clouds, and extreme biomes all synthesized by PRNGs and Perlin Noise.

Website design from Jon Barron