Log inSign up
Rulin Shao
479 posts
Image
user avatar
Rulin Shao
@RulinShao
PhD @UWNLP, visiting researcher @Meta.
rulinshao.github.io
Joined April 2022
1,084
Following
5,039
Followers
  • Pinned
    user avatar
    Rulin Shao
    @RulinShao
    May 25
    DR Tulu is now accepted for an oral presentation at #ICML2026 🙏 Updated paper: arxiv.org/abs/2511.19399 📥We added more ablations including using Qwen3-8B as the rubric generator&judge, showing evolving rubrics work with a weak model too; spurious rewards sanity check, etc.
    user avatar
    Rulin Shao
    @RulinShao
    May 1
    Happy to share that DR Tulu has been accepted to ICML as a ✨Spotlight✨! We believe that co-evolving the agent and its reward metric can lead to more capable intelligence. DR Tulu is a team effort. Huge thanks and congrats to all my amazing collaborators and mentors!
    Image
    arxiv.org
    DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
    Deep research agents perform multi-step research to produce long-form, well-attributed answers. However, most open deep research agents are trained on easily verifiable short-form QA tasks via...
    27K
  • user avatar
    Rulin Shao
    @RulinShao
    Jul 17, 2024
    🔥We release the first open-source 1.4T-token RAG datastore and present a scaling study for RAG on perplexity and downstream tasks! We show LM+RAG scales better than LM alone, with better performance for the same training compute (pretraining+indexing) retrievalscaling.github.io 🧵
    Image
    GIF
    118K
  • user avatar
    Rulin Shao
    @RulinShao
    Oct 10, 2023
    Introduce LightSeq for long-context LLM training: - Highly optimized for decoder models - smarter checkpointing - better support for fewer heads models up to 2x faster, 2-8x longer sequences vs Megatron-LM. arxiv.org/abs/2310.03294
    Image
    GIF
    125K
  • user avatar
    Rulin Shao
    @RulinShao
    May 1, 2025
    Meet ReasonIR-8B✨the first retriever specifically trained for reasoning tasks! Our challenging synthetic training data unlocks SOTA scores on reasoning IR and RAG benchmarks. ReasonIR-8B ranks 1st on BRIGHT and outperforms search engine and retriever baselines on MMLU and GPQA🔥
    Image
    64K
  • user avatar
    Rulin Shao
    @RulinShao
    Jun 13, 2025
    🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt
    Spurious Prompts.
    Image
    user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    May 27, 2025
    🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
    29K
  • user avatar
    Rulin Shao
    @RulinShao
    Sep 26, 2024
    Happy to share our work on RAG scaling is accepted by @NeurIPSConf 🥳 Some new thoughts on this work: (1) Retrieving from a web-scale datastore is another way to do test-time scaling. It doesn't add much to the training cost, leading to better compute-optimal scaling curves. 🔎🧵
    Image
    Image
    user avatar
    Rulin Shao
    @RulinShao
    Jul 17, 2024
    🔥We release the first open-source 1.4T-token RAG datastore and present a scaling study for RAG on perplexity and downstream tasks! We show LM+RAG scales better than LM alone, with better performance for the same training compute (pretraining+indexing) retrievalscaling.github.io 🧵
    36K
  • user avatar
    Rulin Shao
    @RulinShao
    Feb 21, 2025
    New features added to MassiveDS-pipe to make it painless to build and serve trillion-token datastore: 1. Distributed API serving (<30ms latency); 2. Efficient indices: IVF-Flat, IVF-PQ; 3. Memory-free fast passage loading. It has been adopted by AI2 OpenScholar and Meta EWE 🥳
    Image
    25K
  • user avatar
    Rulin Shao
    @RulinShao
    Jul 8, 2025
    Happy to share that ReasonIR is accepted by @COLM_conf! Synthetic data & test-time scaling are powerful tools to enable new capabilities for challenging tasks. I’m impressed by how quickly smaller retrievers and better rerankers have been developed with ReasonIR data! #COLM2025
    user avatar
    Rulin Shao
    @RulinShao
    May 1, 2025
    Meet ReasonIR-8B✨the first retriever specifically trained for reasoning tasks! Our challenging synthetic training data unlocks SOTA scores on reasoning IR and RAG benchmarks. ReasonIR-8B ranks 1st on BRIGHT and outperforms search engine and retriever baselines on MMLU and GPQA🔥
    Image
    12K
  • user avatar
    Rulin Shao
    @RulinShao
    May 27, 2025
    One more fun thing! RLVR can elicit existing behaviors like code reasoning. But! If your model is not good at code but thought it could? - RLVR w/ spurious rewards let Olmo use more code: but perf decreased (Fig 6) - When we discourage it not to: the perf goes up!🤣 (Fig 9)
    Image
    Image
    user avatar
    Stella Li ✈️ ICML🇰🇷
    @StellaLisy
    May 27, 2025
    🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
    14K
  • user avatar
    Rulin Shao
    @RulinShao
    Dec 9, 2024
    I'll be presenting MassiveDS and DCLM at #NeurIPS2024! Drop by or DM me to catch up! Happy to chat about anything--RAG, reasoning, synthetic data, model architecture design, etc.! MassiveDS: Wed 11-2 pm, #7203 (calendar: tinyurl.com/massiveds) DCLM: Fri 4:30-7:30pm, #5109
    Image
    Image
    13K
  • user avatar
    Rulin Shao
    @RulinShao
    May 18, 2025
    Accepted by #ACL2025! Congrats @mingdachen and the team🥳 Several cool ideas: - Maintain an explicit editable working memory during generation; - Actively integrate external feedback (factual check w/ VeriScore); A smart LM learns to memorize, a smarter LM learns to forget too!
    user avatar
    Aran Komatsuzaki
    @arankomatsuzaki
    Dec 25, 2024
    Meta presents Improving Factuality with Explicit Working Memory Presents EWE, a novel approach that enhances factuality in long-form text generation by integrating a working memory that receives real-time feedback from external resources EWE outperforms strong baselines on four
    Image
    11K
  • user avatar
    Rulin Shao
    @RulinShao
    Oct 8, 2025
    #COLM2025 Please drop by our ReasonIR poster at Poster3 #967 (11:00am - 1:00pm Wed) by @varsha_kishore_ 🥰 Happy to answer questions or chat online--feel free to DM! I've been exploring deep research training lately to empower reasoning+search for complex tasks💪 Stay tuned!
    Image
    Image
    user avatar
    Rulin Shao
    @RulinShao
    May 1, 2025
    Meet ReasonIR-8B✨the first retriever specifically trained for reasoning tasks! Our challenging synthetic training data unlocks SOTA scores on reasoning IR and RAG benchmarks. ReasonIR-8B ranks 1st on BRIGHT and outperforms search engine and retriever baselines on MMLU and GPQA🔥
    10K
  • user avatar
    Rulin Shao
    @RulinShao
    Jul 10, 2024
    Happy to share LightSeq is accepted by @COLM_conf 🥳 LightSeq supports efficient long-context Transformer training, where the supported context length grows with the number of nodes. We are excited about the innovative applications it will enable, such as long-context LLM/VLM! 🚀
    user avatar
    Rulin Shao
    @RulinShao
    Oct 10, 2023
    Introduce LightSeq for long-context LLM training: - Highly optimized for decoder models - smarter checkpointing - better support for fewer heads models up to 2x faster, 2-8x longer sequences vs Megatron-LM. arxiv.org/abs/2310.03294
    Image
    GIF
    17K
  • user avatar
    Rulin Shao
    @RulinShao
    Aug 8, 2025
    Factuality and logical reasoning (e.g., math, code) favor different sets of reasoning patterns. 🧑‍🍳 A fresh RL recipe to improve factuality is here — crafted by the amazing @ccsasuke!
    user avatar
    Jason Weston
    @jaseweston
    Aug 8, 2025
    ...is today a good day for new paper posts? 🤖Learning to Reason for Factuality 🤖 📝: arxiv.org/abs/2508.05618 - New reward func for GRPO training of long CoTs for *factuality* - Design stops reward hacking by favoring precision, detail AND quality - Improves base model across
    Image
    7.7K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement