Yann Dubois (@yanndubs) / X

Yann Dubois

604 posts

Yann Dubois

@yanndubs

Posttraining @OpenAI | PhD @StanfordAILab

San Francisco

Joined August 2017

Pinned
Yann Dubois
@yanndubs
Mar 5
🔥Two things I'm esp excited about 5.4: 1. Unification: we merged our codex & mainline models 2. Efficiency: we brought the efficiency of 5.3-codex to CUA & knowledge work. We only showed 3 such plots in the blog but many of our evals required less time (tokens/tools) than 5.2.
50K
Yann Dubois
@yanndubs
Aug 12, 2025
I saw a lot of people complaining about 32k context size in ChatGPT for plus users, which would be terrible for coding. But actually we are giving 196k context size for plus users when using GPT5 thinking and that’s the model you should use for coding use-cases! 32k is for the
Mark Kretschmann
@mark_k
Aug 11, 2025
GPT-5 with a 32K context: Why it causes problems for @OpenAI users! A 32K token window for GPT-5 sounds generous until you try to do real work with it. In multi turn chats and coding sessions, that budget melts fast. Every message carries overhead you never see, including system
887K
Yann Dubois
@yanndubs
Jun 22, 2021
Most data is processed by algorithms, but compressors (eg JPEG) are for human eyes. 🤓Our fix: formalize lossy compression that ensures perfect downstream predictions 🔥1000x gains vs JPEG on ImageNet🔥 arxiv.org/abs/2106.10800 w. Ben Bloem-Reddy @karen_ullrich @cjmaddison 1/9
Yann Dubois
@yanndubs
Jun 8, 2023
Developing chat LLMs is hard without an automated way to measure improvements 🔥It just became easier with AlpacaEval🔥 An automated evaluation pipeline that’s - easy to use - fast - cheap - validated w/ 20K human annotations 🥇leaderboard: tatsu-lab.github.io/alpaca_eval/ 🧵
167K
Yann Dubois
@yanndubs
Aug 10, 2025
We significantly increased the rate limits to reasoning model by popular demand. If correctness is really important for you ask the model to “think deeper” or select “gpt5 thinking” in the model picker, this uses a higher reasoning effort than when you are auto switched to
Sam Altman
@sama
Aug 10, 2025
Replying to @techikansh
trying 3000 per week now!
92K
Yann Dubois
@yanndubs
Mar 13, 2023
🦙Excited to share this demo of Alpaca 🔥Highlights: ~GPT3.5 performance for < 600$🔥 The goal was to have a simple model /training procedure that academics could study and improve with limited resources We achieved that by finetuning a 7B LLaMA on 52K generated instructions
215K
Yann Dubois
@yanndubs
Aug 12, 2025
Now that we can vibe code with GPT5 thinking, there's no way we will mess up the GPT6 plot!
00:00
59K
Yann Dubois
@yanndubs
Apr 17, 2025
Our model is also pretty good at doing useless but fun stuff! @OpenAI
00:44
OpenAI
@OpenAI
Apr 16, 2025
Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date. For the first time, our reasoning models can agentically use and combine every tool within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation.
58K
Yann Dubois
@yanndubs
Jul 27, 2023
We now evaluated also the @metaai's LLaMA 2 70B model! 🔥 Exciting times ahead! We also updated ChatGPT which seems to have improved over the last months Thanks for providing the compute/API end point @a16z @replicatehq @appenz @rajko_rad @Mascobot
Yann Dubois
@yanndubs
Jul 19, 2023
We evaluated LLaMA-2 Chat! It seems to be similar quality as the latest Vicuna's. Excited to see how much the community will be able to improve it using LLaMA-2 base and their fine-tuning pipelines! @WizardLM_AI @lmsysorg @huggingface 🚀 github.com/tatsu-lab/alpa…
142K
Yann Dubois
@yanndubs
Aug 24, 2025
You can thank @ericmitchellai for caring so much about that
Kol Tregaskes
@koltregaskes
Aug 18, 2025
GPT-5 says 'I don't know'. Love this, thank you.
68K
Yann Dubois
@yanndubs
Aug 12, 2025
Clearly, our GPT-5 UX for pro users is less than ideal… But thankfully, you can ask GPT-5 to vibe code it for you! Send us any design, and we’d love to consider it as we work on improving the UX! cc @ericmitchellai @max_a_schwarzer @sama
00:00
95K
Yann Dubois
@yanndubs
Nov 17, 2025
Our goal with "gpt5.1 thinking" was making thinking models usable as *daily drivers* for productive usecases. That's why we focused on improving the model's efficiency with its thinking (~60% less thinking on easy prod queries) while retaining accuracy! If you use ChatGPT for
42K
Yann Dubois
@yanndubs
Jul 18, 2023
I looked a little into the Gzip OOD results and there seems to be another big problem: train-test overlap. E.g. DengueFilipino has the same train and test set. KirundiNews has 90% overlap... Still nice to see people revisit old ideas and the use of information theory for ML :)
Lucas Beyer (bl16)
@giffmana
Jul 18, 2023
Looks like the gzip paper I was enthusiastic about over-estimated its scores because of a bug in the code: it did top-2 knn instead of k=2. We should remember this as (yet another) a strong case for testing in ml code. I still like that it put a new idea in my toolbox.
208K
Yann Dubois
@yanndubs
Dec 6, 2021
#NeurIPS2021 Spotlight: Lossy Compression for Lossless Prediction We formalize compression for machine learning rather than human perception - 💥1000x💥 compression gains compared to JPEG - prove minimal bit-rate for given downstream perf. join us @ poster session 2 tomorrow
GIF
Yann Dubois
@yanndubs
Jun 22, 2021
Most data is processed by algorithms, but compressors (eg JPEG) are for human eyes. 🤓Our fix: formalize lossy compression that ensures perfect downstream predictions 🔥1000x gains vs JPEG on ImageNet🔥 arxiv.org/abs/2106.10800 w. Ben Bloem-Reddy @karen_ullrich @cjmaddison 1/9