Gavin Guo (@Zhen4good) / X

Gavin Guo

1,915 posts

Gavin Guo

@Zhen4good

Multimodal Avocado @Meta Previously @Apple @MITIBMLab @MIT_CSAIL @BerkeleyPhysics Opinions Are My Own

Menlo Park

Joined March 2023

Gavin Guo
@Zhen4good
May 27, 2025
🚀Synthetic Data RL: Task Definition Is All You Need No human labels, no problem. We fine-tune foundation models purely from task definitions — achieving 91.7% on GSM8K (+17.2pp over base), matching RL with full human data. Paper: arxiv.org/abs/2505.17063 Huggingface:
arxiv.org
Synthetic Data RL: Task Definition Is All You Need
Reinforcement learning (RL) is a powerful way to adapt foundation models to specialized tasks, but its reliance on large-scale human-labeled data limits broad adoption. We introduce Synthetic Data...
17K
Gavin Guo
@Zhen4good
Aug 23, 2025
Replying to @bingxu_
I was ambitious about making a change when joking Apple, excited for the last year’s announcement. Now I’m also taking a recover at msl, hoping that can really make a difference this time
4.8K
Gavin Guo
@Zhen4good
Apr 12, 2024
JetMoE's technical report is out. Using only open-source data and code, we matched Llama 2's performance at a fraction of the cost. EVERY Training details are shared to advance open foundation model research. @Meta @OpenAI @Google @MistralAI @X @MIT_CSAIL
arxiv.org
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human...
5.9K
Gavin Guo
@Zhen4good
Mar 1, 2025
Replying to @SajwaniCrypto
H56Rujo2QU5acJH74YCwmr99p57PL4MsSfpfJTY6m4Wq
386
Gavin Guo
@Zhen4good
Aug 17, 2025
Replying to @natolambert
Qwen is the best overall, deepseek more researchy
2K
Gavin Guo
@Zhen4good
Aug 16, 2025
Replying to @elonmusk
So they were actually biased before?
1.5K
Gavin Guo
@Zhen4good
Apr 5, 2025
Replying to @DrJimFan
nice summary
1.2K
Gavin Guo
@Zhen4good
Aug 16, 2025
Replying to @suchenzang
You can write a PR and on call engineers at PyTorch will fix it.
13K
Gavin Guo
@Zhen4good
Jan 9, 2025
Replying to @punk1685
H56Rujo2QU5acJH74YCwmr99p57PL4MsSfpfJTY6m4Wq
50
Gavin Guo
@Zhen4good
Feb 21, 2025
Replying to @punk1685
H56Rujo2QU5acJH74YCwmr99p57PL4MsSfpfJTY6m4Wq
134
Gavin Guo
@Zhen4good
May 27, 2025
Yes. RL helps models choose better paths among those they've seen during training. But if a path hasn't been seen, RL alone won't help.
Omar Khattab
@lateinteraction
May 27, 2025
Let's use this to generalize recent stuff: The key of "RL on LLMs" are the LLM priors not, say, policy gradients. This should be obvious, nothing new. But pay attention to what matters here: the language describing the task, actions, and environment. If it's off, nothing works.
441
Gavin Guo
@Zhen4good
May 2, 2024
Replying to @DrJimFan and @scale_AI
mostly agree, but think democracy can still be gamed temporarily
2.9K
Gavin Guo
@Zhen4good
May 24, 2024
@UCBerkeley Can u please change the logo for better looking? This is no doubt UGLY
263
Gavin Guo
@Zhen4good
Jul 15, 2024
Replying to @Yuchenj_UW and @karpathy
chinchilla law is very outdated. Please check my paper "more compute is what you need"
999