Rishabh Agarwal (@agarwl

Rishabh Agarwal

1,609 posts

Rishabh Agarwal

@agarwl_

Reinforcement Learner @periodiclabs, Adjunct Prof at McGill. Ex Meta, DeepMind, Brain, @iitbombay. NeurIPS Best Paper, On-Policy Distillation

Joined May 2016

Pinned
Rishabh Agarwal
@agarwl_
Aug 25, 2025
This is my last week at @AIatMeta. It was a tough decision not to continue with the new Superintelligence TBD lab, especially given the talent and compute density. But after 7.5 years across Google Brain, DeepMind, and Meta, I felt the pull to take on a different kind of risk.
463K
Rishabh Agarwal
@agarwl_
Feb 7, 2025
I recently gave a tutorial on knowledge distillation for LLMs, explaining the mathematical derivations behind the commonly used methods. Sharing the slides here given the recent interest in this topic. drive.google.com/file/d/1xMohjQ…
202K
Rishabh Agarwal
@agarwl_
Sep 17, 2024
I gave my first guest lecture today in a grad course on LLMs as an (soon-to-be) adjunct prof at McGill. Putting the slides here, maybe useful to some folks ;) drive.google.com/file/d/1komQ7s…
96K
Rishabh Agarwal
@agarwl_
Oct 23, 2025
Yuandong is well-respected within Meta, detail oriented, and technically sharp -- this layoff doesn't make sense and my hunch is that it might be targeted towards ex-GenAI people. Meta's loss, but could be your win if you hiring frontier RL researchers ;)
Yuandong Tian
@tydsh
Oct 23, 2025
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
160K
Rishabh Agarwal
@agarwl_
Oct 16, 2025
*checks chatgpt* This paper costs ~4.2 million USD (400K GB200 hours) -- science! Our most expensive run was a 100K GPU hour (same amount as Deepseek-R1-zero but on GB200s). One finding here was that once we have a scalable RL algorithm, RL compute scaling becomes predictable
Devvrit
@Devvrit_Khatri
Oct 16, 2025
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
235K
Rishabh Agarwal
@agarwl_
Apr 7, 2025
Joined the Llama team @AIatMeta today to work on RL and reasoning
AI at Meta
@AIatMeta
Apr 5, 2025
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model
64K
Rishabh Agarwal
@agarwl_
Apr 3, 2025
After nearly 7 years at Google -- today's officially my last day at Google DeepMind (after an one month notice). This is what I sent to my co-workers: I joined Google Brain as an AI resident for a short stint but liked it here so much that I ended up staying until now! During
50K
Rishabh Agarwal
@agarwl_
Sep 30, 2025
I asked Liam: Why name our startup Periodic Labs? He said think about *periods* in time. And then it hit me: we define entire periods of history by their critical materials: Copper Age → Bronze Age → Iron Age → Silicon Age. The name says all about our mission: to discover
Liam Fedus
@LiamFedus
Sep 30, 2025
Today, @ekindogus and I are excited to introduce @periodiclabs. Our goal is to create an AI scientist. Science works by conjecturing how the world might be, running experiments, and learning from the results. Intelligence is necessary, but not sufficient. New knowledge is
79K
Rishabh Agarwal
@agarwl_
Oct 27, 2025
Very nice blog post from Thinky (@_kevinlu et al) about on-policy distillation for LLMs -- we published this idea back in 2023 and it is *publicly* known to be successfully applied to Gemma 2 & 3, and Qwen3-Thinking (and probably many closed frontier models)! The idea behind
Thinking Machines
@thinkymachines
Oct 27, 2025
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
61K
Rishabh Agarwal
@agarwl_
Nov 6, 2025
Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to
Alexandre
@alexpiche_
Nov 4, 2025
In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave: youtu.be/Z1uEuRKACRs
132K
Rishabh Agarwal
@agarwl_
May 16, 2025
All you often need is just one lucky break. For me, it was @geoffreyhinton who took a bet on me about 7 years ago. He said something along the following lines that stuck with me: “You have tried a bunch of interesting research directions , and all of them failed — that’s what
Nathan Lambert
@natolambert
May 14, 2025
My path into AI The sort of small wins that accumulate into a real career in AI. When I started grad school AI prof's didn't have space for me in their group and when I ended I had no papers at NeurIPS/ICLR/ICML, yet the process can still work. interconnects.ai/p/my-path-into…
49K
Rishabh Agarwal
@agarwl_
Oct 5, 2024
Really promising results we got recently: Generative CoT Verifiers trained on only grade-school math problems in GSM8K generalize quite well to much harder *high-school competition* problems in MATH!
64K
Rishabh Agarwal
@agarwl_
Oct 16, 2023
Our team would likely be hiring student researchers in Google DeepMind. Please fill the interest form if you would like to work with me. This role would start Jan / Feb 2024 and would be in-person in Montreal with 80-100% time at GDM.
docs.google.com
Student Researcher interest form
This is for gathering interest for a student researcher I plan to host in GDM Montreal office in 2024 (this has to be in-person). About me: You can see my webpage and recent papers at https://agarw...
126K
Rishabh Agarwal
@agarwl_
Jan 26, 2022
The field of ML has seen massive growth and it is becoming apparent it may be in need of self-reflection to ensure that efforts are directed towards real progress. To this end, we are organizing an @iclr_conf workshop on "ML Evaluation Standards". ml-eval.github.io [1/N]