I'm a fourth year CS PhD student in Berkeley EECS advised by Dan Klein. Previously, I was a UC Berkeley undergrad, where I had the great opportunity to work with and learn from a number of fantastic AI researchers, such as Sergey Levine, Ruiqi Zhong, Dan Klein, Jacob Steinhardt, and Jason Eisner. I was also previously a Student Researcher at Google DeepMind. I am now working at Cursor on training frontier coding agents.
We introduce sleep-time compute, which allows models to βthinkβ offline about contexts before queries are presented: by anticipating what queries users might ask and pre-computing useful quantities, we can significantly reduce the compute requirements at test-time.
Can we predict emergent capabilities in GPT-N+1 π using GPT-N, which has random performance on the task? To do this, we use information about how pre-emergence model checkpoints behave under the influence of task-specific finetuning to obtain predictive power about the point of emergence in the few-shot setting.
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Charlie Snell,
Jaehoon Lee,
Kelvin Xu,
Aviral Kumar arXiv 2024 [paper]
On difficult problems, humans tend to think longer to improve their decisions. Can we instill a similar capability into LLMs? And how well can it perform? We find that by optimally scaling test-time compute we can outperform much larger models in a FLOPs matched evaluation.
Recent systems β like Koala, Vicuna, and Alpaca β finetune a weaker language model to imitate the outputs of a stronger model, like ChatGPT or GPT-4. In this work, we critically analyze the shortcomings of this approach.
Language models significantly benefit from context tokens, such as prompts or scratchpads. We propose to apply context distillation so that a language model can improve itself by internalizing these gains.
We propose an effective and easy-to-use offline RL motivated method for steering language models towards successfully completing language tasks, such as goal directed dialogue, controled generation, and word games.
Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL Ruiqi Zhong*,
Charlie Snell*,
Dan Klein,
Jason Eisner EMNLP 2023 [paper]
We introduce APEL, a new framework that enables non-programmers to indirectly annotate natural language utterances with executable meaning representations, such as SQL programs.
We extend techniques from learning-based control, such as task relabeling, to derive a simple and effective method to finetune language models in a goal-aware way.
Why do models often attend to salient words, and how does this evolve throughout training?
The Omniglot Jr. challenge; Can a model achieve child-level character generation and classification?
Eliza Kosoy,
Masha Belyi,
Charlie Snell,
Josh Tenenbaum,
Brenden Lake,
Alison Gopnik NeurIPS Workshop on BabyMind 2020 [paper]
We augment the original Omniglot dataset with a new dataset of children's handwritten characters. We then study the properties of a Bayesian Program Learning model trained on this new data.
A tour through the wonderful AI art scene that emerged when CLIP was released in January 2021.
How is it so good ? (DALL-E Explained Pt. 2)
April 2021
[blog]
A technical and philosophical discussion of how DALL-E works, why it is so effective at generating images from a text prompt, and its theoretical limitations.
Understanding VQ-VAE (DALL-E Explained Pt. 1)
February 2021
[blog]
How do vector quantized variational autoencoders (VQ-VAEs) work? And what role do they play in modern generative models, such as DALL-E and Jukebox?
Built on top of HuggingFace's Transformers library, JaxSeq enables training very large language models in Jax with model and data parallelism across both multi-device and multi-node clusters.
Re-implementation of the paper "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
November 2021
[code]
Re-create the dramatic train/test curves from the original paper; experiment with the grokking phenomenon yourself.
Music Preference Visualization with Deep Embeddings
June-July 2020
[tweet]
Harness the power of deep music representations to generate playlists and visualize your music preferences in an interactive web app.
Train Deep Neural Networks on a 2013 Macbook Air GPU
2017/2018
[code]
A deep learning framework implemented from scratch in C++/OpenCL. Implements GPU kernels that can run on a 2013 Macbook Air GPU (and other Apple computers). Implements LSTM training/inference for music lyric generation.
Scroll through an infinite 2D block-world consisting of rugged terrain, endless caves, fluffy clouds, and extreme biomes all synthesized by PRNGs and Perlin Noise.