Prithviraj (Raj) Ammanabrolu (@rajammanabrolu) / X

Prithviraj (Raj) Ammanabrolu

3,415 posts

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

Reinforcement Learning and Language. Assistant Prof @UCSanDiego. Research Scientist @Nvidia.

San Diego, CA

Joined April 2019

Pinned
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Nov 24, 2025
My entire PEARLS Lab, and many NVIDIA colleagues, will be at #neurips2025 in SD to chat about our latest. Some papers in the conf are already kinda outdated so just reach out to @bosungkim17 for all things VLA, embodied AI, and long context memory arxiv.org/abs/2505.16928
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Nov 3, 2025
I've done a few versions of this talk but this is the first that's been recorded publicly, thanks to @IVADO_Qc! A good overview of things my lab has been up to in the last year or so at least in balancing safety/capabilities esp re embodied human-AI colab
19K
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Mar 4, 2025
I taught a grad course on AI Agents at UCSD CSE this past quarter. All lecture slides, homeworks & course projects are now open sourced! I provide a grounding going from Classical Planning & Simulations -> RL Control -> LLMs and how to put it all together pearls-lab.github.io/ai-agents-cour…
155K
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Jan 26, 2025
Simply, no. I've been looking at my old results from doing RL with "verifiable" rewards (math puzzle games, python code to pass unit tests) starting from 2019 with GPT-1/2 to 2024 with Qwen Math Deepseek's success likely lies in the base models improving, the RL is constant
Kevin Patrick Murphy
@sirbayes
Jan 26, 2025
Is it feasible to do a true tabula rasa version of deepseek R1 zero, starting from an LLM with random weights, similar to alpha zero? Or is starting with an LLM which is pre trained on math required?
113K
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Jun 7, 2023
Soon™, I'll be an Asst Prof @UCSanDiego @ucsd_cse focusing on interactive & grounded AI, RL, NLP I will also be a research scientist @MosaicML helping lead efforts to make tech like RLHF more accessible Looking for PhD students & research eng/scientists to join me in ☀️SoCal🏖️
191K
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
May 15, 2025
I like the Ultra Scale Playbook from @huggingface and give it to my MS/first year PhD students to read as a prereq huggingface.co/spaces/nanotro… Is there an "RLSys" version of this on scaling RL+LLM training? If not + there's OSS community interest, I'll prob write one?
35K
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Sep 21, 2020
Using GPT-3 instead of regex
James Farmer
@JamesFarmer87
Sep 19, 2020
Well this has made my day.
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Apr 25, 2021
I haven't been home in years. I stay up at night thinking of all the people I'll never see again. I'd like to have a home to go back to. All I can do is donate/RT so I'm boosting #CovidIndia posts that can help. If this bothers you, pls mute/unfollow. Don't send me DMs like this
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Oct 5, 2022
The secret to aligning LMs to human preferences is reinforcement learning. But Why&How is it used? Announcing 💻RL4LMs: library to train any @huggingface LM w/ RL github.com/allenai/RL4LMs 👾GRUE: benchmark of 6 NLP tasks+rewards 📈NLPO: new RL alg 4 LMs 🌐rl4lms.apps.allenai.org
GIF
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Jun 14, 2024
ML Systems people need to be stopped. Half of these kernel fusions are not numerically stable 😭 Yes it makes GPU go brr but it also breaks policy gradient theorem and makes me question my life decisions every day
109K
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Nov 13, 2023
The PEARLS Lab at @ucsd_cse is now open for business! I'm recruiting Fall 24 PhD students in all things interactive and grounded AI, RL, and NLP!! Join us in the land of 🏖️ beach (🧋pearl tea included). Apply by Dec 20. Please help spread the word! More: pearls.ucsd.edu
184K
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Oct 31, 2024
The year is 2027, NeurIPS tickets are now sold on Ticketmaster and the black market for thousands. Only 5 companies and friends can get in. It's easier to get tickets to the Taylor Swift concert next door so you can sneak into the poster halls
NeurIPS Conference
@NeurIPSConf
Oct 29, 2024
Due to a high demand for registrations, NeurIPS will be moving towards a randomized lottery system, effective immediately. Authors of accepted conference and workshop papers are still guaranteed registration, but this may change as we release spots to the lottery, so we urge
20K
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Dec 19, 2021
If it doesn't work with seed 42, it'll never work.
This Post is from a suspended account. Learn more
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Jul 24, 2020
I have a language modeling joke, but it's too dangerous to be released.
Ida Momennejad
@criticalneuro
Jul 24, 2020
I have a reinforcement learning joke, but not sure it's rewarding.
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu
Mar 17, 2022
Why do ML academics have such knee jerk reactions to writing rules or engines to ground and control an ML system? "It won't work in the real world" is such an unsubstantiated argument. Have you ever actually put an ML system in production?? How do you think those work???