🚀Synthetic Data RL: Task Definition Is All You Need
No human labels, no problem.
We fine-tune foundation models purely from task definitions — achieving 91.7% on GSM8K (+17.2pp over base), matching RL with full human data.
Paper: arxiv.org/abs/2505.17063
Huggingface:
Gavin Guo
1,915 posts
Multimodal Avocado @Meta
Previously @Apple @MITIBMLab @MIT_CSAIL @BerkeleyPhysics
Opinions Are My Own
- Replying to @bingxu_I was ambitious about making a change when joking Apple, excited for the last year’s announcement. Now I’m also taking a recover at msl, hoping that can really make a difference this time
- JetMoE's technical report is out. Using only open-source data and code, we matched Llama 2's performance at a fraction of the cost. EVERY Training details are shared to advance open foundation model research. @Meta @OpenAI @Google @MistralAI @X @MIT_CSAIL
- Replying to @SajwaniCryptoH56Rujo2QU5acJH74YCwmr99p57PL4MsSfpfJTY6m4Wq
- Replying to @natolambertQwen is the best overall, deepseek more researchy
- Replying to @suchenzangYou can write a PR and on call engineers at PyTorch will fix it.
- Yes. RL helps models choose better paths among those they've seen during training. But if a path hasn't been seen, RL alone won't help.Let's use this to generalize recent stuff: The key of "RL on LLMs" are the LLM priors not, say, policy gradients. This should be obvious, nothing new. But pay attention to what matters here: the language describing the task, actions, and environment. If it's off, nothing works.
- @UCBerkeley Can u please change the logo for better looking? This is no doubt UGLY
- Replying to @Yuchenj_UW and @karpathychinchilla law is very outdated. Please check my paper "more compute is what you need"



