Log inSign up
Hailey Schoelkopf
737 posts
Image
user avatar
Hailey Schoelkopf
@haileysch__
hillclimbing towards generality @anthropicai | prev @AiEleuther | views my own
sf + boston
haileyschoelkopf.github.io
Joined June 2022
1,035
Following
5,305
Followers
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Nov 12, 2024
    Major life update: I'm joining @AnthropicAI this week! Looking forward to meeting and working with the amazing team there! I’m beyond thankful for an amazing 2 years with my colleagues and collaborators at @AiEleuther .
    85K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Dec 19, 2023
    work in ML, they said. it’ll be fun, they said. Now I’m reading about the Based architecture and its HellaSwag score
    114K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Feb 24, 2025
    the only eval i trust now, and the vibes are immaculate
    Image
    Image
    user avatar
    Anthropic
    @AnthropicAI
    Feb 24, 2025
    Replying to @AnthropicAI
    Claude 3.7 Sonnet is a significant upgrade over its predecessor. Extended thinking mode gives the model an additional boost in math, physics, instruction-following, coding, and many other tasks. In addition, API users have precise control over how long the model can think for.
    92K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Apr 25, 2023
    Excited to announce our paper Pythia: A Suite for Analyzing LLMs across Training and Scaling has been accepted as an Oral paper at #ICML2023 ! arxiv.org/abs/2304.01373
    Image
    55K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    May 22, 2025
    opus 4 is here—“o4”, if you will
    user avatar
    Anthropic
    @AnthropicAI
    May 22, 2025
    Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
    A benchmarking table titled Claude 4 benchmarks comparing performance metrics across various capabilities including coding, reasoning, tool use, multilingual Q&A, visual reasoning, and mathematics.
    23K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    May 24, 2024
    My favorite bit in this paper: I and @bbrabbasi wrote an appendix formalizing what is done evaluating models with loglikelihood multiple choice and perplexity evals. afaik, none of this has been written up in one place in most papers and just been tacitly assumed before!
    Image
    Image
    user avatar
    EleutherAI
    @AiEleuther
    May 24, 2024
    Excited to share our new paper, Lessons From The Trenches on Reproducible Evaluation of Language Models! In it, we discuss common challenges we’ve faced evaluating LMs, and how our library the Evaluation Harness is designed to mitigate them 🧵 arxiv.org/abs/2405.14782
    36K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Feb 25, 2025
    2 misconceptions from yesterday: - no training on playing videogames. Claude can just do this now - “why isn’t this a demo”
    user avatar
    Anthropic
    @AnthropicAI
    Feb 25, 2025
    Replying to @AnthropicAI
    Claude Plays Pokémon continues on as a researcher's personal project. Follow along on Twitch: twitch.tv/claudeplayspok…
    17K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Oct 17, 2023
    Beyond excited for our work on Llemma to finally be public!! We trained very strong general math LMs (7B > Minerva 8B, 34B ~= Minerva 62B) and released them + training, eval, analysis code! Can’t wait to see the math+AI field pick these up for future developments in the open.
    user avatar
    Zhangir Azerbayev
    @zhangir_azerbay
    Oct 17, 2023
    We release Llemma: open LMs for math trained on up to 200B tokens of mathematical text. The performance of Llemma 34B approaches Google's Minerva 62B despite having half the parameters. Models/data/code: github.com/EleutherAI/mat… Paper: arxiv.org/abs/2310.10631 More ⬇️
    Image
    48K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Nov 7, 2023
    This proposal suggests a global moratorium on training runs > 10^24 flops—approximately that used by llama 2-70B, and 100x lower than the EO reporting threshold. Just astoundingly absurd
    user avatar
    Andrea Miotti
    @andreamiotti
    Nov 6, 2023
    The AI Summit consensus is clear: it's time for international measures. Here is a concrete proposal. In our recent paper, @jasonhausenloy , Claire Dennis and I propose an international institution to address extinction risk from AI: MAGIC, a Multinational AGI Consortium.
    Image
    46K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Feb 29, 2024
    so, it turns out i am the top solo user for downloads on HF (thanks solely to my lm-eval MMLU mirror! 😅) now is probably a good time to express how grateful i am for the users, contributors, and community around the LM Evaluation Harness! :’)
    Image
    46K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Dec 19, 2023
    Replying to @haileysch__
    so that I can use it in Oogabooga
    10K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Apr 5, 2023
    We’ve released the paper for Pythia, a set of LLMs designed to facilitate scientific study on LLMs and their training data! So excited to have this out finally! Read more here:
    user avatar
    Stella Biderman
    @BlancheMinerva
    Apr 5, 2023
    Have you ever wanted to do an experiment on LLMs and found that none of the existing model suites met your needs? At @AiEleuther we got tired of this happening and so designed a model suite that centers enabling scientific research as its primary goal arxiv.org/abs/2304.01373
    24K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    May 8, 2024
    what’s the current SOTA for KV cache compression? what are some must-read papers on this topic?
    32K
  • user avatar
    Hailey Schoelkopf
    @haileysch__
    Feb 28, 2024
    👀it's always incredible to me just how ubiquitous and clear the induction head bump is
    Image
    21K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement