Log inSign up
Sagnik
270 posts
Image
user avatar
Sagnik
@saagnikkk
2nd Year PhD student @ UIUC
sagnikmukherjee.github.io
Joined January 2023
315
Following
834
Followers
  • Pinned
    user avatar
    Sagnik
    @saagnikkk
    Nov 30, 2025
    🚨New Blog Alert: Is AdamW an overkill for RLVR? We found that vanilla SGD is 1. As performant as AdamW, 2. 36x more parameter efficient naturally. (much more than a rank 1 lora) 🤯 Looks like a "free lunch". Maybe It’s time to rethink the optimizers for RLVR 🧵
    Image
    175K
  • user avatar
    Sagnik
    @saagnikkk
    May 20, 2025
    🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models” From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮 And this isn’t a one-off. The pattern holds across RL algorithms and models. 🧵A Deep Dive
    Image
    193K
  • user avatar
    Sagnik
    @saagnikkk
    Nov 2, 2024
    🚀 Can LLMs aggregate information from diverse web sources ? We try to answer that in our latest work: INFOGENT: a modular, agent-based framework for information aggregation on the web! Website: gangiswag.github.io/infogent/ 🌐🔍 🧵 [1/n]
    Image
    28K
  • user avatar
    Sagnik
    @saagnikkk
    May 2, 2025
    🚀Our ICML 2025 paper introduces "Premise-Augmented Reasoning Chains" - a structured approach to induce explicit dependencies in reasoning chains. By revealing the dependencies within chains, we significantly improve how LLM reasoning can be verified. 🧵[1/n]
    Image
    11K
  • user avatar
    Sagnik
    @saagnikkk
    May 20, 2025
    Replying to @saagnikkk
    🧵[1/n] 🚨 RL Updates Are Sparse🚨 🔍 Finding 1: RL fine-tuning leads to extremely sparse updates — between 68.5% to 96% of parameters remain unchanged! ⚙️ This pattern holds across all major RL algorithms
    Image
    8.1K
  • user avatar
    Sagnik
    @saagnikkk
    May 20, 2025
    Replying to @saagnikkk
    Paper - arxiv.org/abs/2505.11711 Work done with amazing collaborator @lifan__yuan and advised by our amazing advisors @dilekhakkanitur and @haopeng_nlp
    arXiv logo
    arxiv.org
    Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
    Reinforcement learning (RL) yields substantial improvements in large language models (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from...
    3.4K
  • user avatar
    Sagnik
    @saagnikkk
    Jun 21, 2024
    📢📢Socio-demographic Prompts are used for Cultural Alignment and studying biases. But is the “bias” elicited by these prompts systematic? Are they similar/different from when LLMs are prompted with "Your favorite programming language is C++"?🧐 The answer will surprise you
    Image
    11K
  • user avatar
    Sagnik
    @saagnikkk
    Jul 11, 2025
    🚀 Headed to #ICML2025 in Vancouver (July 13-19) ! We will present our paper in the poster session at East Exhibition Hall on Tuesday (15th) at 4:30 PM PDT. Happy to chat regarding reasoning, post-training and anything LLMs in general !
    Image
    Image
    user avatar
    Sagnik
    @saagnikkk
    May 2, 2025
    🚀Our ICML 2025 paper introduces "Premise-Augmented Reasoning Chains" - a structured approach to induce explicit dependencies in reasoning chains. By revealing the dependencies within chains, we significantly improve how LLM reasoning can be verified. 🧵[1/n]
    3.1K
  • user avatar
    Sagnik
    @saagnikkk
    May 20, 2025
    Replying to @saagnikkk
    🧵[7/n] 🔍 Potential Reasons 💡 We hypothesize that the in-distribution nature of training data is a key driver behind this sparsity 🧠 The model already "knows" a lot — RL just fine-tunes a small, relevant subnetwork rather than overhauling everything
    Image
    3.1K
  • user avatar
    Sagnik
    @saagnikkk
    May 20, 2025
    Replying to @saagnikkk
    🧵[2/n] 💡 SFT Updates Are Dense 💡 Unlike RL, Supervised Fine-Tuning (SFT) updates are much denser 🧠 📊 Sparsity is low — at most only 15.31% of parameters remain untouched.
    Image
    4.5K
  • user avatar
    Sagnik
    @saagnikkk
    May 20, 2025
    Replying to @saagnikkk
    🧵[6/n] 🌐 The Subnetwork Is General 🔁 Subnetworks trained with different seed, datasets, or even algorithms show nontrivial overlap 🧩 Suggests the subnetwork is a generalizable structure tied to the base model 🧠 A shared backbone seems to emerge, no matter how you train it
    Image
    3.2K
  • user avatar
    Sagnik
    @saagnikkk
    May 20, 2025
    Replying to @saagnikkk
    🧵[8/n] To the best of our knowledge this is the first mechanistic evidence that shows contrast between learning from in distribution (or on-policy) data vs Off Distribution (off-policy) data.
    3.2K
  • user avatar
    Sagnik
    @saagnikkk
    May 20, 2025
    Replying to @saagnikkk
    🧵[4/n] 📚 Each Layer Is Equally Sparse (or Dense) 📏 No specific layer or sublayer gets special treatment — all layers are updated equally sparsely. 🎯Despite the sparsity, the updates are still full-rank
    Image
    3.7K
  • user avatar
    Sagnik
    @saagnikkk
    May 20, 2025
    Replying to @saagnikkk
    🧵[3/n] 📉 Even Gradients Are Sparse in RL 📉 🧠 In PRIME, 72% of parameters never receive any gradient — ever! ↔️ Some do, but their gradients cancel out over time. 🎯 It’s not just sparse updates, even the gradients are sparse
    Image
    3.9K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement