Log inSign up
elie
Prime Intellect
7,140 posts
Image
user avatar
elie
Prime Intellect
@eliebakouch
training llm @PrimeIntellect (prev: @huggingface) anon feedback: admonymous.co/eliebakouch
Joined January 2024
4,264
Following
19.4K
Followers
  • Pinned
    user avatar
    elie
    Prime Intellect
    @eliebakouch
    May 14
    we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2930, codex 2950, both beating the human baseline of 2990. we cover claude autonomy failures, codex high compute usage, and much more primeintellect.ai/auto-nanogpt
    Image
    Image
    user avatar
    Prime Intellect
    @PrimeIntellect
    May 14
    Automating AI research is the next major step in AI We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours Opus now holds the record at 2930 steps vs the 2990 human baseline
    152K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Oct 30, 2025
    Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…
    Image
    1.9M
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Feb 19, 2025
    holy shit, 100+ pages book to fully understand distributed training.
    Image
    148K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Nov 6, 2025
    imagine openai official account answering "awesome!" on claude sonnet 4.5 release
    user avatar
    Z.ai
    @Zai_org
    Nov 6, 2025
    Replying to @Kimi_Moonshot
    Awesome!
    251K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Mar 27, 2025
    the only way i'm posting grad_norm plot from now on
    Image
    Image
    user avatar
    elie
    Prime Intellect
    @eliebakouch
    Mar 24, 2025
    I'm using muon, and my grad norm is randomly forming an M shape, wtf
    60K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Jul 11, 2025
    Kimi team just trained a state of the art open source model 32B active parameter/1T total with 0 training instabilities, thanks to MuonClip, this is amazing
    Image
    163K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Sep 10, 2025
    Wait, they are comparing a 3B active model with GEMINI 2.5 PRO AND R1 Pretty crazy perf (the model is not even that sparse) and it's open source Apache 2.0.
    Image
    92K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Jul 26, 2025
    Noam Shazeer 2020 paper with no equation, just pseudo code with einsum
    Image
    71K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Jul 8, 2025
    Super excited to share SmolLM3, a new strong 3B model. SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more! > Train on 11T token on 384 H100 for 220k GPU hours > Support long context up to 128k thanks to NoPE and intra document masking >
    Image
    129K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Aug 31, 2025
    The technical report of @Meituan_LongCat LongCat-Flash is crazy good and full of novelty. The model is a 560B passive ~27B active MoE with adaptive number of active parameters depending on the context thanks to the Zero-Computational expert. 1) New architecture > Layers have 2
    Image
    250K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Jun 30, 2025
    This is a very impressive family of open models by Baidu, competitive with qwen3 and latest Deepseek V3. Lot of details in the tech report (~60 pages) + they open source the training code as well.
    Image
    100K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Nov 11, 2025
    Very cool blog by @character_ai diving into how they trained their proprietary model Kaiju (13B, 34B, 110B), before switching to OSS model, and spoiler: it has Noam Shazeer written all over it. Most of the choices for model design (MQA, SWA, KV Cache, Quantization) are not to
    Image
    120K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Mar 27, 2025
    New latex template
    Image
    46K
  • user avatar
    elie
    Prime Intellect
    @eliebakouch
    Jan 25, 2025
    Full open source reproduction of @deepseek_ai R1 in progress ⏳
    Image
    63K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement