David Dohan (@dmdohan) / X

David Dohan

571 posts

David Dohan

@dmdohan

reducing perplexity @openai | past: probabilistic programs, proteins, science & reasoning @ google brain 🧠

Joined August 2011

Pinned
David Dohan
@dmdohan
Jul 22, 2022
Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper: arxiv.org/abs/2207.10342
David Dohan
@dmdohan
Nov 6, 2022
“99% of Americans don’t talk about AI at parties. You can too if you try!”
David Dohan
@dmdohan
Mar 1, 2023
New chapter: Happy to share that I recently joined @OpenAI! Thankful for many collaborators, friends, and mentors who made my 6 years of research @Google Brain special🧠 Excited to collaborate toward reliable reasoning & alignment in AI systems and products like #ChatGPT
185K
David Dohan
@dmdohan
Dec 20, 2024
o3 @ 87.5% on ARC-AGI It was 16 hours at an increase rate of 3.5% an hour to "solved"
David Dohan
@dmdohan
Dec 20, 2024
At this rate, how long til ARC-AGI is “solved”? For context: - gpt-4o @ 5% - Sonnet3.5 @ 14% - o1-preview @ 18% - o1 @ 32% - best scaffolded solution @ 54%
1.1M
David Dohan
@dmdohan
Dec 20, 2024
imo the improvements on FrontierMath are even more impressive than ARG-AGI. Jump from 2% to 25% Terence Tao said the dataset should "resist AIs for several years at least" and "These are extremely challenging. I think that in the near term basically the only way to solve them,
Nat McAleese
@__nmca__
Dec 20, 2024
Replying to @__nmca__
Well, on FrontierMath 2024-11-26 o3 improves the state of the art from 2% to 25% accuracy. These are absurdly hard strongly held out math questions. And on ARC, the semi-private test set and public validation set scores are 87.5% (private) and 91.5% (public). (7/n)
153K
David Dohan
@dmdohan
Nov 20, 2023
🩶🫶 Ilya and Sam’s yin/yang was a major reason I joined OpenAI. It is still possible to repair what was shattered.
Ilya Sutskever
@ilyasut
Nov 20, 2023
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
168K
David Dohan
@dmdohan
Sep 12, 2024
Replying to @dmdohan
It's important to emphasize that this is a huge leap /and/ we're still at the start Give o1-preview a try, we think you'll like it. And in a month, give o1 a try and see all the ways it has improved in such a short time And expect that to keep happening
233K
David Dohan
@dmdohan
Nov 20, 2023
OpenAI is nothing without its people
57K
David Dohan
@dmdohan
Dec 20, 2024
At this rate, how long til ARC-AGI is “solved”? For context: - gpt-4o @ 5% - Sonnet3.5 @ 14% - o1-preview @ 18% - o1 @ 32% - best scaffolded solution @ 54%
ARC Prize
@arcprize
Dec 19, 2024
Verified o1 performance on ARC-AGI's Semi-Private Eval (100 tasks) o1, Low: 25% ($1.5/task) o1, Medium: 31% ($2.5/task) o1, High: 32% ($3.8/task)
235K
David Dohan
@dmdohan
Sep 12, 2024
🍓is ripe and is ready to think, fast and slow: check out OpenAI o1, trained to reason before answering I joined OpenAI to push boundaries of science & reasoning with AI. Happy to share this result of team's amazing collaboration does just that Try it on your hardest problems
OpenAI
@OpenAI
Sep 12, 2024
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introduc…
36K
David Dohan
@dmdohan
Nov 19, 2023
🩶🫶
Sam Altman
@sama
Nov 19, 2023
i love the openai team so much
58K
David Dohan
@dmdohan
Nov 24, 2023
language models are superhuman at predicting the next word try this yourself to see how hard it is rr-lm-game.herokuapp.com
Jason Wei
@_jasonwei
Nov 24, 2023
Like the International Math Olympiad or Spelling Bee, there should be a “language modeling competition” where humans compete to predict the next word in a sequence. The best humans would probably still lose to GPT-2, and we’d have more empathy for how hard it is to be an LLM :)
181K
David Dohan
@dmdohan
Feb 17, 2023
LM performance typically gets worse given irrelevant info. Simple prompting improves it: "Feel free to ignore irrelevant information given in the questions." Work led by @fredahshi, with @xinyun_chen_, @kanishkamisra, @nkscales_google, @edchi, Nathanael Schärli, & @denny_zhou
37K
David Dohan
@dmdohan
Dec 20, 2024
We are used to the cadence of big model releases: GPT2->3->4 took two years each time We’re in a different world now o1 was announced months ago, now already on next generation Expect faster improvement going forward: o1 is like gpt2 if we could jump to gpt4 ~immediately
48K