Log inSign up
Mengzhou Xia
303 posts
user avatar
Mengzhou Xia
@xiamengzhou
Research @OpenAI, PhD @princeton_nlp.
xiamengzhou.github.io
Joined May 2015
826
Following
4,682
Followers
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Oct 11, 2023
    We release the strongest public 1.3B and 3B models so far – the ShearedLLaMA series. Structured pruning from a large model to a small one is far more cost-effective (only 3%!) than pre-training them from scratch! Check out our paper and models at: xiamengzhou.github.io/sheared-llama/ [1/n]
    Image
    217K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Feb 14, 2024
    Lots of instruction tuning data out there...but how to best adapt LLMs for specific queries? Don’t use ALL of the data, use LESS! 5% beats the full dataset. Can even use one small model to select data for others! Paper: arxiv.org/abs/2402.04333 Code: github.com/princeton-nlp/… [1/n]
    Image
    68K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Dec 21, 2022
    How do language models of different sizes learn during the course of pre-training? We study the training trajectories with training checkpoints of language model from 125M to 175B for a better understanding! Check out our new paper 📜: arxiv.org/abs/2212.09803 (1/N)
    arXiv logo
    arxiv.org
    Training Trajectories of Language Models Across Scales
    Scaling up language models has led to unprecedented performance gains, but little is understood about how the training dynamics change as models get larger. How do language models of different...
    101K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Mar 13, 2024
    I am honored to receive the Apple Scholars in AIML fellowship! Very grateful to my advisor, mentors and collaborators along the way :) Excited to keep exploring the Pareto-frontier of capabilities and efficiency of foundation models!
    user avatar
    Princeton Computer Science
    @PrincetonCS
    Mar 13, 2024
    Congrats to @xiamengzhou on receiving an Apple Scholars in AIML fellowship! 🎉🍏 The fellowship recognizes graduate students doing innovative and cutting-edge research in machine learning. Xia is part of @princeton_nlp, advised by @danqi_chen. bit.ly/3wZz78q
    Mengzhou Xia
    22K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Dec 10, 2024
    I am excited to attend #NeurIPS2024 🤩! I’ll be presenting SimPO and CharXiv, and would love to catch up and chat about: - RLHF, reasoning, high-quality data synthesis and generally about AI! - And.. also about the academic and industry job markets!
    Image
    Image
    23K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    May 24, 2024
    We train and evaluate extensively with various offline preference optimization algorithms, including DPO, KTO, ORPO, RDPO, and more. Hyperparameter tuning significantly impacts algorithm effectiveness. DPO performs consistently well, but SimPO is better!
    user avatar
    Yu Meng
    @yumeng0818
    May 24, 2024
    Introducing SimPO: Simpler & more effective Preference Optimization!🎉 Significantly outperforms DPO w/o a reference model!📈 Llama-3-8B-SimPO ranked among top on leaderboards!💪 ✅44.7% LC win rate on AlpacaEval 2 ✅33.8% win rate on Arena-Hard arxiv.org/abs/2405.14734 🧵[1/n]
    Image
    35K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Jul 19, 2024
    🌟 Exciting update! Gemma2-9b + SimPO ranks at the top of AlpacaEval 2 (❗LC 72.4) and leads the WildBench leaderboard among similar-sized models 🚀 SimPO is at least competitive as (and often outperforms) DPO across all benchmarks, despite its simplicity. ✨ Recipe: on-policy
    Image
    Image
    Image
    Image
    42K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Jun 2, 2025
    Surprisingly, we find training only with incorrect traces leads to strong performance 🤯 Even more interesting: it improves model diversity and test-time scaling—while correct traces do the opposite. Check out the 🧵👇
    user avatar
    Xinyu Zhu
    @tianhongzxy
    Jun 2, 2025
    🔥The debate’s been wild: How does the reward in RLVR actually improve LLM reasoning?🤔 🚀Introducing our new paper👇 💡TL;DR: Just penalizing incorrect rollouts❌ — no positive reward needed — can boost LLM reasoning, and sometimes better than PPO/GRPO! 🧵[1/n]
    Image
    15K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Nov 6, 2023
    🌟We release the code for training Sheared-LLaMA here at github.com/princeton-nlp/…. We're excited to see even stronger sheared models emerging in the future! 🤩 For more details, check out our preprint at arxiv.org/abs/2310.06694.
    user avatar
    Mengzhou Xia
    @xiamengzhou
    Oct 11, 2023
    We release the strongest public 1.3B and 3B models so far – the ShearedLLaMA series. Structured pruning from a large model to a small one is far more cost-effective (only 3%!) than pre-training them from scratch! Check out our paper and models at: xiamengzhou.github.io/sheared-llama/ [1/n]
    Image
    Image
    GitHub - princeton-nlp/LLM-Shearing: [ICLR 2024] Sheared LLaMA: Accelerating Language Model...
    From github.com
    35K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    May 2, 2022
    Check out our #acl2022 paper on CoFi☕️! Structured pruning is competitive compared to knowledge distillation but requires much less training time and zero unlabeled data. Joint work w/ @ZexuanZhong, @danqi_chen Paper: arxiv.org/pdf/2204.00408… Code: github.com/princeton-nlp/… (1/5)
    Image
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Jun 7, 2022
    Check out our preprint on Prompting ELECTRA! We show that discriminative models like ELECTRA outperform generative MLMs like BERT and RoBERTa on zero-shot and few-shot prompting. Joint work w/ @artetxem, @JefferyDuu, @danqi_chen, @vesko_st Paper: arxiv.org/pdf/2205.15223…
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Dec 22, 2022
    I'm pleased and honored to receive the fellowship and thanks to @TechAtBloomberg for supporting my research 😀
    user avatar
    Tech At Bloomberg
    @TechAtBloomberg
    Dec 21, 2022
    Congratulations to @PrincetonCS + @princeton_nlp's @xiamengzhou on being named one of the 2022-2023 @Bloomberg #DataScience Ph.D. Fellows! Learn more about her research focus and the other Fellows in our newest cohort: bloom.bg/3BROsru #AI #ML #NLProc
    Mengzhou Xia of Princeton University’s Department of Computer Science is a 2022-2023 Bloomberg Data Science Ph.D. Fellow.
    23K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    May 31, 2023
    Our LLM trajectory paper got accpted to #ACL2023 😊! Code and results are at github.com/xiamengzhou/tr… Looking forward to future work to analyze trajectories not only in pre-training but also in the more accessible yet mysterious process of instruction tuning with human feedback.
    user avatar
    Mengzhou Xia
    @xiamengzhou
    Dec 21, 2022
    How do language models of different sizes learn during the course of pre-training? We study the training trajectories with training checkpoints of language model from 125M to 175B for a better understanding! Check out our new paper 📜: arxiv.org/abs/2212.09803 (1/N)
    Image
    GitHub - xiamengzhou/training_trajectory_analysis: [ACL 2023]: Training Trajectories of Language...
    From github.com
    27K
  • user avatar
    Mengzhou Xia
    @xiamengzhou
    Dec 10, 2023
    This is my first time attending #NeurIPS 🥳 I’d love to chat about efficient approaches for LLMs, learning dynamics/trajectories and more! DM me to grab a coffee together :)
    16K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement