Log inSign up
Stella Li
412 posts
Image
user avatar
Stella Li
@StellaLisy
PhD student @uwnlp | visiting researcher @AIatMeta | undergrad @jhuclsp #NLProc
Seattle, WA
stellalisy.com
Joined April 2022
534
Following
3,591
Followers
  • Pinned
    user avatar
    Stella Li
    @StellaLisy
    May 7
    LMs can learn from human labels, training data, and stronger teachers. But what happens when all of these run out🫪 when the model is already at the frontier and there is no stronger external source to learn from❓ In EvoLM, we extract the model's own evaluative knowledge into
    Image
    Image
    35K
  • user avatar
    Stella Li
    @StellaLisy
    May 27, 2025
    🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
    Image
    701K
  • user avatar
    Stella Li
    @StellaLisy
    Jul 22, 2025
    WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵
    Image
    51K
  • user avatar
    Stella Li
    @StellaLisy
    Nov 25, 2024
    OpenReview turned into Reddit🤯 Can we now add upvote/downvote buttons to reviews and rebuttals plz?? Would be a very rich and interesting source of preference data🤡
    user avatar
    Ravid Shwartz Ziv
    @ziv_ravid
    Nov 25, 2024
    Looking at ICLR submissions with the lowest score - What a work of art! 🧵
    Image
    32K
  • user avatar
    Stella Li
    @StellaLisy
    Oct 2, 2025
    🚨What if solving a problem correctly isn't enough—cuz the WAY to reason about it based on your audience matters just as much⁉️ We introduce ✨personalized reasoning✨: proactively asking user preferences and adapting HOW models think Frontier models are not doing well at this!🧵
    Image
    34K
  • user avatar
    Stella Li
    @StellaLisy
    Feb 21, 2025
    Asking the right questions can make or break decisions in high-stake fields like medicine, law, and beyond✴️ Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVELY seek information through better questions🏥❓ (co-led with @jiminmun_) 👉🏻🧵
    Image
    25K
  • user avatar
    Stella Li
    @StellaLisy
    Jun 13, 2025
    Spurious Rewards was not all‼️We now present spurious PROMPTS🤔 check out our latest findings and discussion on evaluation: tinyurl.com/spurious-prompt. Who knew Lorem ipsum can bring 19.4% gains compared to default prompt👀 Also, arXiv is out🤩 arxiv.org/abs/2506.10947📄
    Image
    Image
    user avatar
    Stella Li
    @StellaLisy
    May 27, 2025
    🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
    47K
  • user avatar
    Stella Li
    @StellaLisy
    Apr 15, 2023
    Excited to share that I will be starting my PhD at @uwnlp next fall in the @tsvetshop. I’m so grateful for the support from my mentors and friends in the past few years at @jhuclsp. Looking forward to moving back to the west (better👀) coast for the next chapter!! #decisionday
    18K
  • user avatar
    Stella Li
    @StellaLisy
    Dec 6, 2024
    31% of US adults use generative AI for healthcare🤯But most AI systems answer questions assertively—even when they don’t have the necessary context. Introducing #MediQ a framework that enables LLMs to recognize uncertainty🤔and ask the right questions❓when info is missing: 🧵
    Image
    29K
  • user avatar
    Stella Li
    @StellaLisy
    May 27, 2025
    Replying to @StellaLisy
    💡Our hypothesis: RLVR amplifies reasoning patterns that already exist Qwen2.5-Math can uniquely do "code reasoning"-solving math by writing Python💻 (without execution) Code reasoning correlates with correctness (64% w/ vs 29% w/o) Spurious training amplifies code usage to 90%+
    Image
    40K
  • user avatar
    Stella Li
    @StellaLisy
    Oct 4, 2025
    I will be at #COLM2025 next week, super excited to explore Montreal!🍁 I've been thinking about personalization, question-asking, multi-turn, RL etc. DM if you want to chat! Catch me at: 📍Poster for ALFA: Tue 1:30pm 💡Spotlight talk for PrefPalette: Thur 10:15am (poster 11am)
    Image
    Image
    17K
  • user avatar
    Stella Li
    @StellaLisy
    Jun 4, 2025
    Excited to share more about Spurious Rewards! Also keep an eye out for some new experiments and arxiv coming soon 👀🔜
    user avatar
    Cohere Labs
    Cohere
    @Cohere_Labs
    Jun 4, 2025
    Next week on Wednesday, June 11th we're excited to welcome @StellaLisy for a session on "Spurious Rewards: Rethinking Training Signals in RLVR." Thanks to @AhmadMustafaAn1 for organizing this session! 🔥 Learn more: cohere.com/events/Cohere-…
    Image
    18K
  • user avatar
    Stella Li
    @StellaLisy
    May 27, 2025
    Replying to @StellaLisy
    We empirically prove this with surgical experiments: 🐍 Directly rewarding string “python” → +11.8% performance 🚫 Random rewards BUT blocking code → gains disappear The "magic" is just surfacing useful patterns already learned in pre-training.
    Image
    12K
  • user avatar
    Stella Li
    @StellaLisy
    May 27, 2025
    Replying to @StellaLisy
    🚨Future RLVR research should be validated on diverse models rather than a single de facto choice, as we show that it's easy to get significant gains on Qwen even with completely spurious reward signals. 📄 Details, code, and full paper in our blogpost: tinyurl.com/spurious-rewar…
    12K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement