Log inSign up
Shalev
2,376 posts
Image
user avatar
Shalev
@Shalev_lif
do androids dream of electric sheep? building something new, prev @VectorInst @UofT | co-creator of STEVE-1, Multi-Agent Verification
Joined September 2017
450
Following
1,992
Followers
  • user avatar
    Shalev
    @Shalev_lif
    Dec 15, 2024
    Best poster moment at #NeurIPS2024
    Image
    377K
  • user avatar
    Shalev
    @Shalev_lif
    Jun 28, 2025
    The neural network objective function is a very complicated objective function. It's very non convex, and there are no mathematical guarantees whatsoever about its success. And so if you were to speak to somebody who studies optimization from a theoretical point of view, they
    Image
    204K
  • user avatar
    Shalev
    @Shalev_lif
    Jul 19, 2025
    Replying to @NTFabiano
    This research was funded by the international organization of first born children
    17K
  • user avatar
    Shalev
    @Shalev_lif
    Jan 25, 2025
    A new replication of DeepSeek's RL results! Here are my notes and some quick thoughts: Method: - Uses PPO instead of GRPO (DeepSeek-R1), still works - Data is 8K (query, final answer) examples from MATH - Rule-based reward modelling (no neural reward) - Initialize model to
    Image
    Image
    Image
    user avatar
    Junxian He
    @junxian_he
    Jan 25, 2025
    We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly strong. 🚀 Starting from Qwen2.5-Math-7B (base model), we perform RL on it directly. No SFT, no reward model, just 8K MATH examples for verification, the
    44K
  • user avatar
    Shalev
    @Shalev_lif
    Oct 23, 2024
    Great to see @geoffreyhinton at the @VectorInst office today. Here @michaelrzhang is presenting his work on qualitative eval of LLMs! Very cool to have a Nobel Laureate + Turing Award winner around the campus.
    Image
    31K
  • user avatar
    Shalev
    @Shalev_lif
    Dec 13, 2024
    My question to @ilyasut at NeurIPS 2024: Do LLMs generalize multi-hop reasoning out-of-distribution?
    Image
    00:00
    55K
  • user avatar
    Shalev
    @Shalev_lif
    Feb 28, 2025
    Hot off the Servers 🔥💻 --- we’ve found a new approach for scaling test-time compute! Multi-Agent Verification (MAV) scales the number of verifier models at test-time, which boosts LLM performance without any additional training. Now we can scale along two dimensions: by
    Image
    46K
  • user avatar
    Shalev
    @Shalev_lif
    Dec 16, 2024
    Absolutely stacked panel at the System-2 Reasoning at Scale workshop at NeurIPS with Josh Tenenbaum, @MelMitchell1, @fchollet, @jaseweston, @DBahdanau, @dawnsongtweets, and @Yoshua_Bengio (with @nouhadziri moderating). An amazing end to the conference. Will add notes below.
    Image
    39K
  • user avatar
    Shalev
    @Shalev_lif
    Jun 8, 2025
    Replying to @ns123abc
    Uh, she’s a third-year PhD. Many of the most influential papers in AI have been written by PhD students… Also, a paper is written by a team. The first author usually did most of the actual work during the project (ie, writing the code, running the experiments, etc.).
    5.8K
  • user avatar
    Shalev
    @Shalev_lif
    Feb 20, 2024
    Replying to @karpathy
    This reminds me of a meme @_jasonwei posted a while back! That is, once you play with these models so much you kind of develop your own mini test suite to gain intuition of its performance.
    Image
    23K
  • user avatar
    Shalev
    @Shalev_lif
    Dec 13, 2024
    @ilyasut giving a talk at the NeurIPS 2024 Test of Time awards! Will add more photos below, throughout the talk.
    Image
    9.9K
  • user avatar
    Shalev
    @Shalev_lif
    Dec 16, 2024
    Replying to @Shalev_lif
    See the full paper by @shreyaskapur, @jenner_erik, and Stuart Russel!
    arXiv logo
    arxiv.org
    Diffusion On Syntax Trees For Program Synthesis
    Large language models generate code one token at a time. Their autoregressive generation process lacks the feedback of observing the program's output. Training LLMs to suggest edits directly can...
    11K
  • user avatar
    Shalev
    @Shalev_lif
    Dec 22, 2024
    In a few years PhDs won’t be coding much. They’ll have a fleet of agents coding up, running, and tuning their experiments. At that time, the most valuable skill will be deep expertise, as suggested by @RogerGrosse.
    user avatar
    Dan Roy
    @roydanroy
    Dec 21, 2024
    Replying to @roydanroy and @tunguz
    In all seriousness, PhDs today will have tools so powerful that previous generations won’t know what to think of them. I think it is the most exciting time to be working. Just don’t work in an old way.
    21K
  • user avatar
    Shalev
    @Shalev_lif
    Sep 21, 2023
    🥳 Great news! Our paper STEVE-1 has been accepted at #NeurIPS 2023 as a spotlight! I'm so proud to have worked on this project with my amazing collaborators @keirp1 @SirrahChan @jimmybajimmyba @SheilaMcIlraith ✈️ Very excited to present our work in New Orleans! ✈️ Project
    Image
    00:00
    19K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement