Pinned
Leonard Tang
2,236 posts
- You donβt need frontier lab resources for frontier lab automated LLM evaluation. To prove this, weβre open-sourcing j1-nano and j1-micro: two absurdly tiny (600M & 1.7B parameters) but mighty reward models competitive with orders-of-magnitude larger peers. j1-nano and j1-micro
- First came pre-training scaling; then came inference-time scaling. Now comes judge-time scaling. Despite progress in AI through scaled inference-time compute, AI remains unreliable in open-ended, non-verifiable domains. The key limitation is not generationβit is evaluation.
00:00 - born to do research forced to build b2b saas
- i've been entirely consumed these past few weeks by the LLM-as-a-judge research agenda. there's lots of great work, but there's also lots of noise, confusion, and redundancy in the literature. iβve started curating the highest-quality reads here:
- honored to be amongst such amazing peers and communityπ«ΆWe are excited to announce the startups accepted into the fourth batch of AI Grant! See the thread below as well as aigrant.com to learn more
- super excited to share what we've been cooking up at @haizelabsποΈποΈ we are now in the era of grossly excessive AI hype and demoware. but it is high time to recalibrate and revisit the difficult, unsexy, underlying problem that everybody is avoiding -- the AI reliability andToday is a bad, bad day to be a language model. Today, we announce the Haize Labs manifesto. @haizelabs haizes (automatically red-teams) AI systems to preemptively discover and eliminate any failure mode We showcase below one particular application of haizing: jailbreaking the
00:00 - come join @qw3rtman @willccbb and myself for the inaugural communion of the NYC AI Reading Group! > where: @haizelabs hq > when: sunday 4/27 @ 11 am > what: inference-time scaling for generalist reward modeling from @deepseek_ai > who: awesome people like yourself :^)
- nyc ai πππ scintillating discussion on this fine sunday morning. much more to come. @qw3rtman @willccbb @haizelabs
- session #2 of the NYC AI Reading Group w/ @qw3rtman @willccbb is in order! > where: @haizelabs hq > when: thursday 5/29 @ 7:30 pm > what: sft memorizes, rl generalizes: a comparative study of foundation model post-training > who: awesome people like yourself :^) > also:
- we're looking for outlier talent to join the @haizelabs research team if you're interested in: - robustness of real-world AI - active learning - ultra-efficient model tuning - synthetic data generation - reward modeling - weak supervision dm us or apply below!
- We are thrilled to welcome Professor He He @hhexiy as an advisor to the Haize Labs team! Professor He leads a group at NYU focused on evaluation, scalable oversight, humanβAI collaboration, and reasoning.










