Stella Li (@StellaLisy) / X

Stella Li

412 posts

Stella Li

@StellaLisy

PhD student @uwnlp | visiting researcher @AIatMeta | undergrad @jhuclsp #NLProc

Seattle, WA

Joined April 2022

Pinned
Stella Li
@StellaLisy
May 7
LMs can learn from human labels, training data, and stronger teachers. But what happens when all of these run out🫪 when the model is already at the frontier and there is no stronger external source to learn from❓ In EvoLM, we extract the model's own evaluative knowledge into
35K
Stella Li
@StellaLisy
May 27, 2025
🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
701K
Stella Li
@StellaLisy
Jul 22, 2025
WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵
51K
Stella Li
@StellaLisy
Nov 25, 2024
OpenReview turned into Reddit🤯 Can we now add upvote/downvote buttons to reviews and rebuttals plz?? Would be a very rich and interesting source of preference data🤡
Ravid Shwartz Ziv
@ziv_ravid
Nov 25, 2024
Looking at ICLR submissions with the lowest score - What a work of art! 🧵
32K
Stella Li
@StellaLisy
Oct 2, 2025
🚨What if solving a problem correctly isn't enough—cuz the WAY to reason about it based on your audience matters just as much⁉️ We introduce ✨personalized reasoning✨: proactively asking user preferences and adapting HOW models think Frontier models are not doing well at this!🧵
34K
Stella Li
@StellaLisy
Feb 21, 2025
Asking the right questions can make or break decisions in high-stake fields like medicine, law, and beyond✴️ Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVELY seek information through better questions🏥❓ (co-led with @jiminmun_) 👉🏻🧵
25K
Stella Li
@StellaLisy
Jun 13, 2025
Spurious Rewards was not all‼️We now present spurious PROMPTS🤔 check out our latest findings and discussion on evaluation: tinyurl.com/spurious-prompt. Who knew Lorem ipsum can bring 19.4% gains compared to default prompt👀 Also, arXiv is out🤩 arxiv.org/abs/2506.10947📄
Stella Li
@StellaLisy
May 27, 2025
🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
47K
Stella Li
@StellaLisy
Apr 15, 2023
Excited to share that I will be starting my PhD at @uwnlp next fall in the @tsvetshop. I’m so grateful for the support from my mentors and friends in the past few years at @jhuclsp. Looking forward to moving back to the west (better👀) coast for the next chapter!! #decisionday
18K
Stella Li
@StellaLisy
Dec 6, 2024
31% of US adults use generative AI for healthcare🤯But most AI systems answer questions assertively—even when they don’t have the necessary context. Introducing #MediQ a framework that enables LLMs to recognize uncertainty🤔and ask the right questions❓when info is missing: 🧵
29K
Stella Li
@StellaLisy
May 27, 2025
Replying to @StellaLisy
💡Our hypothesis: RLVR amplifies reasoning patterns that already exist Qwen2.5-Math can uniquely do "code reasoning"-solving math by writing Python💻 (without execution) Code reasoning correlates with correctness (64% w/ vs 29% w/o) Spurious training amplifies code usage to 90%+
40K
Stella Li
@StellaLisy
Oct 4, 2025
I will be at #COLM2025 next week, super excited to explore Montreal!🍁 I've been thinking about personalization, question-asking, multi-turn, RL etc. DM if you want to chat! Catch me at: 📍Poster for ALFA: Tue 1:30pm 💡Spotlight talk for PrefPalette: Thur 10:15am (poster 11am)
17K
Stella Li
@StellaLisy
Jun 4, 2025
Excited to share more about Spurious Rewards! Also keep an eye out for some new experiments and arxiv coming soon 👀🔜
Cohere Labs
@Cohere_Labs
Jun 4, 2025
Next week on Wednesday, June 11th we're excited to welcome @StellaLisy for a session on "Spurious Rewards: Rethinking Training Signals in RLVR." Thanks to @AhmadMustafaAn1 for organizing this session! 🔥 Learn more: cohere.com/events/Cohere-…
18K
Stella Li
@StellaLisy
May 27, 2025
Replying to @StellaLisy
We empirically prove this with surgical experiments: 🐍 Directly rewarding string “python” → +11.8% performance 🚫 Random rewards BUT blocking code → gains disappear The "magic" is just surfacing useful patterns already learned in pre-training.
12K
Stella Li
@StellaLisy
May 27, 2025
Replying to @StellaLisy
🚨Future RLVR research should be validated on diverse models rather than a single de facto choice, as we show that it's easy to get significant gains on Qwen even with completely spurious reward signals. 📄 Details, code, and full paper in our blogpost: tinyurl.com/spurious-rewar…
12K