Pinned
DR Tulu is now accepted for an oral presentation at #ICML2026 🙏
Updated paper: arxiv.org/abs/2511.19399
📥We added more ablations including using Qwen3-8B as the rubric generator&judge, showing evolving rubrics work with a weak model too; spurious rewards sanity check, etc.
Happy to share that DR Tulu has been accepted to ICML as a ✨Spotlight✨!
We believe that co-evolving the agent and its reward metric can lead to more capable intelligence.
DR Tulu is a team effort. Huge thanks and congrats to all my amazing collaborators and mentors!
arxiv.org
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
Deep research agents perform multi-step research to produce long-form, well-attributed answers. However, most open deep research agents are trained on easily verifiable short-form QA tasks via...
















