Log inSign up
Gavin Guo
1,915 posts
Image
user avatar
Gavin Guo
@Zhen4good
Multimodal Avocado @Meta Previously @Apple @MITIBMLab @MIT_CSAIL @BerkeleyPhysics Opinions Are My Own
Menlo Park
zguo0525.github.io
Joined March 2023
551
Following
622
Followers
  • user avatar
    Gavin Guo
    @Zhen4good
    May 27, 2025
    🚀Synthetic Data RL: Task Definition Is All You Need No human labels, no problem. We fine-tune foundation models purely from task definitions — achieving 91.7% on GSM8K (+17.2pp over base), matching RL with full human data. Paper: arxiv.org/abs/2505.17063 Huggingface:
    arXiv logo
    arxiv.org
    Synthetic Data RL: Task Definition Is All You Need
    Reinforcement learning (RL) is a powerful way to adapt foundation models to specialized tasks, but its reliance on large-scale human-labeled data limits broad adoption. We introduce Synthetic Data...
    17K
  • user avatar
    Gavin Guo
    @Zhen4good
    Aug 23, 2025
    Replying to @bingxu_
    I was ambitious about making a change when joking Apple, excited for the last year’s announcement. Now I’m also taking a recover at msl, hoping that can really make a difference this time
    4.8K
  • user avatar
    Gavin Guo
    @Zhen4good
    Apr 12, 2024
    JetMoE's technical report is out. Using only open-source data and code, we matched Llama 2's performance at a fraction of the cost. EVERY Training details are shared to advance open foundation model research. @Meta @OpenAI @Google @MistralAI @X @MIT_CSAIL
    arXiv logo
    arxiv.org
    JetMoE: Reaching Llama2 Performance with 0.1M Dollars
    Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human...
    5.9K
  • user avatar
    Gavin Guo
    @Zhen4good
    Mar 1, 2025
    Replying to @SajwaniCrypto
    H56Rujo2QU5acJH74YCwmr99p57PL4MsSfpfJTY6m4Wq
    386
  • user avatar
    Gavin Guo
    @Zhen4good
    Aug 17, 2025
    Replying to @natolambert
    Qwen is the best overall, deepseek more researchy
    2K
  • user avatar
    Gavin Guo
    @Zhen4good
    Aug 16, 2025
    Replying to @elonmusk
    So they were actually biased before?
    1.5K
  • user avatar
    Gavin Guo
    @Zhen4good
    Apr 5, 2025
    Replying to @DrJimFan
    nice summary
    1.2K
  • user avatar
    Gavin Guo
    @Zhen4good
    Aug 16, 2025
    Replying to @suchenzang
    You can write a PR and on call engineers at PyTorch will fix it.
    13K
  • user avatar
    Gavin Guo
    @Zhen4good
    Jan 9, 2025
    Replying to @punk1685
    H56Rujo2QU5acJH74YCwmr99p57PL4MsSfpfJTY6m4Wq
    50
  • user avatar
    Gavin Guo
    @Zhen4good
    Feb 21, 2025
    Replying to @punk1685
    H56Rujo2QU5acJH74YCwmr99p57PL4MsSfpfJTY6m4Wq
    134
  • user avatar
    Gavin Guo
    @Zhen4good
    May 27, 2025
    Yes. RL helps models choose better paths among those they've seen during training. But if a path hasn't been seen, RL alone won't help.
    user avatar
    Omar Khattab
    @lateinteraction
    May 27, 2025
    Let's use this to generalize recent stuff: The key of "RL on LLMs" are the LLM priors not, say, policy gradients. This should be obvious, nothing new. But pay attention to what matters here: the language describing the task, actions, and environment. If it's off, nothing works.
    441
  • user avatar
    Gavin Guo
    @Zhen4good
    May 2, 2024
    Replying to @DrJimFan and @scale_AI
    mostly agree, but think democracy can still be gamed temporarily
    2.9K
  • user avatar
    Gavin Guo
    @Zhen4good
    May 24, 2024
    @UCBerkeley Can u please change the logo for better looking? This is no doubt UGLY
    Image
    263
  • user avatar
    Gavin Guo
    @Zhen4good
    Jul 15, 2024
    Replying to @Yuchenj_UW and @karpathy
    chinchilla law is very outdated. Please check my paper "more compute is what you need"
    999

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement