Mayee Chen (@MayeeChen) / X

Mayee Chen

631 posts

Mayee Chen

@MayeeChen

CS PhD student @StanfordAILab @HazyResearch, undergrad @princeton. Working on all things data! she/her 🎃

Stanford, CA

Joined February 2020

Pinned
Mayee Chen
@MayeeChen
Feb 13
Data mixing - determining ratios across your training datasets - matters a lot for model quality. While building Olmo 3, we learned it’s hard to set up a method that finds a strong mix, and hard to maintain that mix as datasets change throughout development. Introducing Olmix👇
57K
Mayee Chen
@MayeeChen
Jul 29, 2023
Large language models (LMs) rely heavily on training data quality. How do we best select training data for good downstream model performance across tasks? Introducing 🍳Skill-It: a data-driven framework for understanding and training LMs! Paper: arxiv.org/abs/2307.14430 1/13
111K
Mayee Chen
@MayeeChen
Oct 15, 2023
Stanford CS graduate students are running an application support program for diversity students (broadly defined); if you are applying to our CS PhD program, we'll give one round of feedback on your application. SPACE IS LIMITED. APPLY BY OCTOBER 27: Link: tinyurl.com/3x5ke5uz
79K
Mayee Chen
@MayeeChen
Apr 19, 2022
New preprint alert! 📣 How do we produce transferable and robust representations with supervised contrastive learning? We need *geometric spread* and an inductive bias towards *latent subclass clustering* in representation space. 📜 arxiv.org/abs/2204.07596 👇 (1/n)
Mayee Chen
@MayeeChen
Jun 24, 2025
LLMs often generate correct answers but struggle to select them. Weaver tackles this by combining many weak verifiers (reward models, LM judges) into a stronger signal using statistical tools from Weak Supervision—matching o3-mini-level accuracy with much cheaper models! 📊
24K
Mayee Chen
@MayeeChen
Nov 12, 2024
There are many algorithms for constructing pre-training data mixtures—which one should we use? Turns out: many of them fall under one framework, have similar issues, and can be improved with a straightforward modification. Introducing Aioli! 🧄 1/9
28K
Mayee Chen
@MayeeChen
Apr 22, 2025
!!! I'm at #ICLR2025 to present 🧄Aioli🧄 a unified framework for data mixing on Thursday afternoon! 🔗 arxiv.org/abs/2411.05735 Message me to chat about pre/post training data (mixing, curriculum, understanding); test-time compute/verification; or to try new food 🇸🇬
19K
Mayee Chen
@MayeeChen
Jul 2, 2021
New paper appearing in #ICML2021! Mandoline: Model Evaluation under Distribution Shift: Paper: arxiv.org/abs/2107.00643 Code: github.com/HazyResearch/m… work done w/ equal contribution from @krandiash and @nimit_sohoni , as well as @faitpoms, @kayvonf, and @HazyResearch 1/6
Mayee Chen
@MayeeChen
Oct 31, 2023
Skill-It has been accepted to #NeurIPS2023 as a spotlight! Want to understand how skills in your training data can give way to better data selection methods? Our code is available here: github.com/HazyResearch/s…
Mayee Chen
@MayeeChen
Jul 29, 2023
Large language models (LMs) rely heavily on training data quality. How do we best select training data for good downstream model performance across tasks? Introducing 🍳Skill-It: a data-driven framework for understanding and training LMs! Paper: arxiv.org/abs/2307.14430 1/13
23K
Mayee Chen
@MayeeChen
Dec 10, 2024
Given open-ended generations from K different LLMs for an input, can we learn to select the best generation, without needing any labeled data? Introducing our #NeurIPS2024 paper, Smoothie, a label-free test-time LLM routing algorithm! 🥤 1/4
7.3K
Mayee Chen
@MayeeChen
Oct 3, 2024
honored to have contributed to this incredibly exciting work! Archon shows how intelligently blending concepts like ensembling, fusing, and repeated sampling can create very strong LLM inference systems, using 70B+ open models to outperform GPT-4o and Claude 3.5 sonnet!
Jon Saad-Falcon
@JonSaadFalcon
Sep 30, 2024
What is the best way to spend your inference compute budget to create LLM systems greater than the sum of their parts? In our latest paper, we present Archon, an architecture search framework for inference-time techniques! Archon is enabled by inference-time architecture search
14K
Mayee Chen
@MayeeChen
Aug 15, 2023
Embroid's ability to improve LLM performance across 95 classification tasks is quite impressive, and it does so just by exploiting smoothness of other pre-trained model embeddings via knn (with theoretical guarantees too). Amazing work by @NeelGuha and honored to have helped out!
Neel Guha
@NeelGuha
Aug 14, 2023
We’re excited to share Embroid: a method for “stitching” together an LLM with embedding information from multiple smaller models (e.g., BERT), allowing us to automatically correct LLM predictions without supervision. ✍️: hazyresearch.stanford.edu/blog/2023-08-1… 📜: arxiv.org/abs/2307.11031
16K
Mayee Chen
@MayeeChen
Dec 10, 2024
Omw to #NeurIPS2024! I have 2 papers: - Smoothie (w @NeelGuha): a test-time LLM routing approach that requires no labels/training (Thu 4:30 East) - DCLM: a benchmark for LLM data curation (Fri 4:30 West) Excited to meet new ppl & chat about data-centric AI/ test-time approaches!
5K
Mayee Chen
@MayeeChen
Dec 12, 2023
I'm at #NeurIPS2023 from now to Sat and will be around at these posters: - Embroid Tues 5:15 - Skill-It Wed 10:45 - Segmentation for classification Wed 5:00 Let's chat about training data and data-centric frameworks for understanding/aligning LLMs! DM/email me 🍤 Paper details 👇
8K