Log inSign up
Nimit Kalra
221 posts
Image
user avatar
Nimit Kalra
@qw3rtman
Incoming PhD student. Visiting researcher with @MicahGoldblum (self-play, RL, reasoning, world models). Prev: @HaizeLabs @Citadel @UTAustin
off-policy
nimit.io
Joined October 2011
1,099
Following
1,379
Followers
  • user avatar
    Nimit Kalra
    @qw3rtman
    May 18, 2025
    Verdict at @NousResearch RL hackathon! Your calibrated and low-variance LLM-as-a-judge is a reward model 🙈
    Image
    28K
  • user avatar
    Nimit Kalra
    @qw3rtman
    Mar 9, 2025
    we're looking for a rockstar research eng @haizelabs! if you're interested in training tons of models and thinking about adversarial robustness for real-world deployed AI systems, DM me or apply below :)
    16K
  • user avatar
    Nimit Kalra
    @qw3rtman
    Jun 26, 2025
    qwen RL has felt icky recently, but these authors get llama RL to match
    Image
    Image
    Image
    Image
    Image
    user avatar
    Zengzhi Wang
    @SinclairWang1
    Jun 26, 2025
    What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?
    9.2K
  • user avatar
    Nimit Kalra
    @qw3rtman
    Mar 22, 2025
    The more RL I do, the less I believe in evolution
    9.7K
  • user avatar
    Nimit Kalra
    @qw3rtman
    May 28, 2025
    Excited to discuss "SFT Memorizes, RL Generalizes" tomorrow at @haizelabs's NYC AI Reading Group with @leonardtang_ and @willccbb! We'll also explore a broader theme — "what does RL actually learn?", guided by some related works from the past week.
    Image
    7K
  • user avatar
    Nimit Kalra
    @qw3rtman
    May 27, 2025
    We modified DeepSeek's recent Self-Principled Critique Tuning paper and bootstrapped a family of super tiny generalist reward models in < 1 day on a single A100 GPU. By proposing instance-specific rubrics at inference time, j1-micro (1.7B) and j1-nano (0.6B) punch well above
    Image
    5.7K
  • user avatar
    Nimit Kalra
    @qw3rtman
    May 4, 2025
    awful day to be an llm
    Image
    Image
    user avatar
    Leonard Tang
    @leonardtang_
    May 4, 2025
    EVALS EVALS EVALS Core Research @AutinMitra
    5.3K
  • user avatar
    Nimit Kalra
    @qw3rtman
    Jun 26, 2025
    Discussing "Mind the Gap" tonight at @haizelabs's NYC AI Reading Group with @leonardtang_ and @willccbb. Authors study self-improvement through the "Generation-Verification Gap" (model's verification ability over its own generations) and find that this capability log scales with
    Image
    user avatar
    Nimit Kalra
    @qw3rtman
    Jun 7, 2025
    Still noodling on this, but the generation-verification gap proposed by @yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai et al. in arxiv.org/abs/2412.02674 is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning
    9.8K
  • user avatar
    Nimit Kalra
    @qw3rtman
    Apr 2, 2025
    Replying to @Purring_Lynx
    rate limits too low for any real prod use cases tho 🙄
    16K
  • user avatar
    Nimit Kalra
    @qw3rtman
    Jun 29, 2025
    think it was @jxmnop who said that science is about generating artifacts. inspired me to really focus on this this past week, starting with some internal eng tools and paper summaries... grinding out a couple more researchy things for the next couple weeks :) super excited to
    3.4K
  • user avatar
    Nimit Kalra
    @qw3rtman
    Jul 16, 2025
    Flying out to #ICML2025 tonight! Always down to chat about unverifiable domains, evals, red-teaming, safeguards, or just meet cool people. I’ll be a panelist at the Methods and Opportunities at Small Scale workshop, sharing our work on tiny generalist reward models
    Image
    3.9K
  • user avatar
    Nimit Kalra
    @qw3rtman
    Mar 27, 2021
    Replying to @rakyll
    Picked GPL for one of my first open-source projects and really learned this lesson the hard way
    Image
  • user avatar
    Nimit Kalra
    @qw3rtman
    Mar 14, 2025
    What tools are people using these days to search for relevant citations, e.g., papers that actually benchmark against a particular work? Google Scholar first page is usually surveys/prior work sections, which are somewhat useless for tracing the lineage of an approach
    4.7K
  • user avatar
    Nimit Kalra
    @qw3rtman
    May 30, 2025
    Great discussion tonight at @haizelabs HQ about the many many different definitions of generalization / “out of distribution” and which ones we actually care about in practice. + a special shoutout to @marklxu1 for the Joe’s pizza 🤤
    user avatar
    mark xu
    @marklxu1
    May 30, 2025
    thursday night pizza + papers in nyc! thanks to those who came out!! @leonardtang_ @qw3rtman @willccbb
    Image
    1.6K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement