elie (@eliebakouch) / X

elie

7,140 posts

elie

@eliebakouch

training llm @PrimeIntellect (prev: @huggingface) anon feedback: admonymous.co/eliebakouch

Joined January 2024

Pinned
elie
@eliebakouch
May 14
we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2930, codex 2950, both beating the human baseline of 2990. we cover claude autonomy failures, codex high compute usage, and much more primeintellect.ai/auto-nanogpt
Prime Intellect
@PrimeIntellect
May 14
Automating AI research is the next major step in AI We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours Opus now holds the record at 2930 steps vs the 2990 human baseline
152K
elie
@eliebakouch
Oct 30, 2025
Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…
1.9M
elie
@eliebakouch
Feb 19, 2025
holy shit, 100+ pages book to fully understand distributed training.
148K
elie
@eliebakouch
Nov 6, 2025
imagine openai official account answering "awesome!" on claude sonnet 4.5 release
Z.ai
@Zai_org
Nov 6, 2025
Replying to @Kimi_Moonshot
Awesome!
251K
elie
@eliebakouch
Mar 27, 2025
the only way i'm posting grad_norm plot from now on
elie
@eliebakouch
Mar 24, 2025
I'm using muon, and my grad norm is randomly forming an M shape, wtf
60K
elie
@eliebakouch
Jul 11, 2025
Kimi team just trained a state of the art open source model 32B active parameter/1T total with 0 training instabilities, thanks to MuonClip, this is amazing
163K
elie
@eliebakouch
Sep 10, 2025
Wait, they are comparing a 3B active model with GEMINI 2.5 PRO AND R1 Pretty crazy perf (the model is not even that sparse) and it's open source Apache 2.0.
92K
elie
@eliebakouch
Jul 26, 2025
Noam Shazeer 2020 paper with no equation, just pseudo code with einsum
71K
elie
@eliebakouch
Jul 8, 2025
Super excited to share SmolLM3, a new strong 3B model. SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more! > Train on 11T token on 384 H100 for 220k GPU hours > Support long context up to 128k thanks to NoPE and intra document masking >
129K
elie
@eliebakouch
Aug 31, 2025
The technical report of @Meituan_LongCat LongCat-Flash is crazy good and full of novelty. The model is a 560B passive ~27B active MoE with adaptive number of active parameters depending on the context thanks to the Zero-Computational expert. 1) New architecture > Layers have 2
250K
elie
@eliebakouch
Jun 30, 2025
This is a very impressive family of open models by Baidu, competitive with qwen3 and latest Deepseek V3. Lot of details in the tech report (~60 pages) + they open source the training code as well.
100K
elie
@eliebakouch
Nov 11, 2025
Very cool blog by @character_ai diving into how they trained their proprietary model Kaiju (13B, 34B, 110B), before switching to OSS model, and spoiler: it has Noam Shazeer written all over it. Most of the choices for model design (MQA, SWA, KV Cache, Quantization) are not to
120K
elie
@eliebakouch
Mar 27, 2025
New latex template
46K
elie
@eliebakouch
Jan 25, 2025
Full open source reproduction of @deepseek_ai R1 in progress ⏳
63K