Sagnik (@saagnikkk) / X

Sagnik

270 posts

Sagnik

@saagnikkk

2nd Year PhD student @ UIUC

sagnikmukherjee.github.io

Joined January 2023

Pinned
Sagnik
@saagnikkk
Nov 30, 2025
🚨New Blog Alert: Is AdamW an overkill for RLVR? We found that vanilla SGD is 1. As performant as AdamW, 2. 36x more parameter efficient naturally. (much more than a rank 1 lora) 🤯 Looks like a "free lunch". Maybe It’s time to rethink the optimizers for RLVR 🧵
175K
Sagnik
@saagnikkk
May 20, 2025
🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models” From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮 And this isn’t a one-off. The pattern holds across RL algorithms and models. 🧵A Deep Dive
193K
Sagnik
@saagnikkk
Nov 2, 2024
🚀 Can LLMs aggregate information from diverse web sources ? We try to answer that in our latest work: INFOGENT: a modular, agent-based framework for information aggregation on the web! Website: gangiswag.github.io/infogent/ 🌐🔍 🧵 [1/n]
28K
Sagnik
@saagnikkk
May 2, 2025
🚀Our ICML 2025 paper introduces "Premise-Augmented Reasoning Chains" - a structured approach to induce explicit dependencies in reasoning chains. By revealing the dependencies within chains, we significantly improve how LLM reasoning can be verified. 🧵[1/n]
11K
Sagnik
@saagnikkk
May 20, 2025
Replying to @saagnikkk
🧵[1/n] 🚨 RL Updates Are Sparse🚨 🔍 Finding 1: RL fine-tuning leads to extremely sparse updates — between 68.5% to 96% of parameters remain unchanged! ⚙️ This pattern holds across all major RL algorithms
8.1K
Sagnik
@saagnikkk
May 20, 2025
Replying to @saagnikkk
Paper - arxiv.org/abs/2505.11711 Work done with amazing collaborator @lifan__yuan and advised by our amazing advisors @dilekhakkanitur and @haopeng_nlp
arxiv.org
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Reinforcement learning (RL) yields substantial improvements in large language models (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from...
3.4K
Sagnik
@saagnikkk
Jun 21, 2024
📢📢Socio-demographic Prompts are used for Cultural Alignment and studying biases. But is the “bias” elicited by these prompts systematic? Are they similar/different from when LLMs are prompted with "Your favorite programming language is C++"?🧐 The answer will surprise you
11K
Sagnik
@saagnikkk
Jul 11, 2025
🚀 Headed to #ICML2025 in Vancouver (July 13-19) ! We will present our paper in the poster session at East Exhibition Hall on Tuesday (15th) at 4:30 PM PDT. Happy to chat regarding reasoning, post-training and anything LLMs in general !
Sagnik
@saagnikkk
May 2, 2025
🚀Our ICML 2025 paper introduces "Premise-Augmented Reasoning Chains" - a structured approach to induce explicit dependencies in reasoning chains. By revealing the dependencies within chains, we significantly improve how LLM reasoning can be verified. 🧵[1/n]
3.1K
Sagnik
@saagnikkk
May 20, 2025
Replying to @saagnikkk
🧵[7/n] 🔍 Potential Reasons 💡 We hypothesize that the in-distribution nature of training data is a key driver behind this sparsity 🧠 The model already "knows" a lot — RL just fine-tunes a small, relevant subnetwork rather than overhauling everything
3.1K
Sagnik
@saagnikkk
May 20, 2025
Replying to @saagnikkk
🧵[2/n] 💡 SFT Updates Are Dense 💡 Unlike RL, Supervised Fine-Tuning (SFT) updates are much denser 🧠 📊 Sparsity is low — at most only 15.31% of parameters remain untouched.
4.5K
Sagnik
@saagnikkk
May 20, 2025
Replying to @saagnikkk
🧵[6/n] 🌐 The Subnetwork Is General 🔁 Subnetworks trained with different seed, datasets, or even algorithms show nontrivial overlap 🧩 Suggests the subnetwork is a generalizable structure tied to the base model 🧠 A shared backbone seems to emerge, no matter how you train it
3.2K
Sagnik
@saagnikkk
May 20, 2025
Replying to @saagnikkk
🧵[8/n] To the best of our knowledge this is the first mechanistic evidence that shows contrast between learning from in distribution (or on-policy) data vs Off Distribution (off-policy) data.
3.2K
Sagnik
@saagnikkk
May 20, 2025
Replying to @saagnikkk
🧵[4/n] 📚 Each Layer Is Equally Sparse (or Dense) 📏 No specific layer or sublayer gets special treatment — all layers are updated equally sparsely. 🎯Despite the sparsity, the updates are still full-rank
3.7K
Sagnik
@saagnikkk
May 20, 2025
Replying to @saagnikkk
🧵[3/n] 📉 Even Gradients Are Sparse in RL 📉 🧠 In PRIME, 72% of parameters never receive any gradient — ever! ↔️ Some do, but their gradients cancel out over time. 🎯 It’s not just sparse updates, even the gradients are sparse
3.9K