Nat McAleese (@__nmca_

Nat McAleese

759 posts

Nat McAleese

@__nmca__

Research @AnthropicAI. Previously @OpenAI, @DeepMind. Views my own.

Joined September 2021

Pinned
Nat McAleese
@__nmca__
Jun 4
Claude writes an enormous amount of code for us now, and is helping to build itself.
5.8K
Nat McAleese
@__nmca__
Dec 20, 2024
o3 represents enormous progress in general-domain reasoning with RL — excited that we were able to announce some results today! Here’s a summary of what we shared about o3 in the livestream (1/n)
762K
Nat McAleese
@__nmca__
Feb 12, 2025
the greatest minds of our generation will be eclipsed by scaling up RL
Nat McAleese
@__nmca__
Oct 27, 2024
the greatest minds of our generation will be eclipsed by scaling up RL
381K
Nat McAleese
@__nmca__
Sep 8, 2024
OpenAI works miracles, but we do also wrap a lot of things in bash while loops to work around periodic crashes.
1.6M
Nat McAleese
@__nmca__
Sep 18, 2025
everything reminds me of him 😭
303K
Nat McAleese
@__nmca__
Mar 10, 2025
large reasoning models are extremely good at reward hacking. A thread of examples from OpenAI's recent monitoring paper: (0/n)
165K
Nat McAleese
@__nmca__
May 13, 2024
you guys are going to get like 8x more than you expect.
314K
Nat McAleese
@__nmca__
Sep 29, 2025
To be fair, ████████████. ████████████████████████.
Stephen McAleer
@McaleerStephen
Sep 29, 2025
Having done RL at OpenAI and Anthropic, here's what I can say about GRPO:
280K
Nat McAleese
@__nmca__
Jul 19, 2025
I feel this may be helpful to some of you today:
89K
Nat McAleese
@__nmca__
Jul 19, 2025
We are seeing much faster AI progress than **Paul Christiano** and **Yudkowsky** predicted, who had gold in 2025 at 8% and 16% respectively, by methods that are more general than expected
139K
Nat McAleese
@__nmca__
Jul 20, 2025
fun: 3/4 months ago I ran o3 for some academics on a set of AIME-style problems. It has taken them so long to write a summary of the results (96% irrc) that Alex solved proof & IMO in the meantime lol
299K
Nat McAleese
@__nmca__
Dec 20, 2024
Lots of folks are posting quotes from Gowers/Tao about the hardest split of FrontierMath, but our 25% score is on the full set (which is also extremely hard, with old sota 2%, but not as hard as those quotes imply).
175K
Nat McAleese
@__nmca__
Jan 23, 2025
Epoch AI are going to publish more details, but on the OpenAI side for those interested: we did not use FrontierMath data to guide the development of o1 or o3, at all. (1/n)
66K
Nat McAleese
@__nmca__
Nov 9, 2025
I sometimes get the impression that academia does not want LLMs to work or AGI to be possible. There is exuberance for negative results that are plausibly over-interpreted.
Yang Yue
@YangYue_THU
Nov 8, 2025
Thrilled that our paper received the only perfect score at NeurIPS this year. Huge thanks to my collaborators and the reviewers. See you in San Diego! limit-of-rlvr.github.io papercopilot.com/statistics/neu… credit to @papercopilot
154K