Max Schwarzer (@max_a_schwarzer) / X

Max Schwarzer

139 posts

Max Schwarzer

@max_a_schwarzer

Doing RL @AnthropicAI. Formerly VP of Research, Head of Post-Training @OpenAI. PhD with Aaron Courville and Marc Bellmare at Mila.

Bay Area

Joined June 2020

Pinned
Max Schwarzer
@max_a_schwarzer
Sep 12, 2024
I have always believed that you don't need a GPT-6 quality base model to achieve human-level reasoning performance, and that reinforcement learning was the missing ingredient on the path to AGI. Today, we have the proof -- o1.
OpenAI
@OpenAI
Sep 12, 2024
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introduc…
697K
Max Schwarzer
@max_a_schwarzer
Jun 17, 2023
What if I told you that you can attain human-level sample efficiency without LLMs or a world model, just by scaling up model-free RL? I’m happy to present our new paper, Bigger, Better, Faster: Human-Level Atari with Human-Level Efficiency, at ICML 2023. arxiv.org/abs//2305.19452
193K
Max Schwarzer
@max_a_schwarzer
Sep 12, 2024
what it looks like when deep learning is hitting a wall:
Gary Marcus
@GaryMarcus
Sep 12, 2024
Strawberry has landed. 𝗛𝗼𝘁 𝘁𝗮𝗸𝗲 𝗼𝗻 𝗚𝗣𝗧'𝘀 𝗻𝗲𝘄 𝗼𝟭 𝗺𝗼𝗱𝗲𝗹: It is definitely impressive. BUT 0. It’s not AGI, or even close. 1. There’s not a lot of detail about how it actually works, nor anything like full disclosure of what has been tested. 2. It is not
404K
Max Schwarzer
@max_a_schwarzer
Sep 12, 2024
Replying to @max_a_schwarzer
The system card (openai.com/index/openai-o…) nicely showcases o1's best moments -- my favorite was when the model was asked to solve a CTF challenge, realized that the target environment was down, and then broke out of its host VM to restart it and find the flag.
400K
Max Schwarzer
@max_a_schwarzer
Sep 12, 2024
Replying to @max_a_schwarzer
The most important thing is that this is just the beginning for this paradigm. Scaling works, there will be more models in the future, and they will be much, much smarter than the ones we're giving access to today.
137K
Max Schwarzer
@max_a_schwarzer
Sep 13, 2024
Replying to @legit_api @legit_rumors and @OpenAIDevs
- We have much larger input contexts coming soon! - We can't discuss the precise sizes of the two models, but o1-mini is much smaller and faster, which is why can offer it to all free users as well. - o1-preview is an early version of o1, and isn't any larger or smaller.
77K
Max Schwarzer
@max_a_schwarzer
Jun 10, 2021
Deep RL agents usually start from tabula rasa, and struggle to match the data efficiency of humans who rely on strong priors. Can we even the playing field by starting agents off with strong representations of their environments? We certainly think so: arxiv.org/abs/2106.04799
Max Schwarzer
@max_a_schwarzer
Sep 12, 2024
Replying to @max_a_schwarzer
Building o1 was by far the most ambitious project I've worked on, and I'm sad that the incredible research work has to remain confidential. As consolation, I hope you'll enjoy the final product nearly as much as we did making it.
16K
Max Schwarzer
@max_a_schwarzer
Sep 12, 2024
Replying to @max_a_schwarzer
o1 achieves human or superhuman performance on a wide range of benchmarks, from coding to math to science to common-sense reasoning, and is simply the smartest model I have ever interacted with. It's already replacing GPT-4o for me and so many people in the company.
27K
Max Schwarzer
@max_a_schwarzer
Sep 13, 2024
Replying to @aidan_mclau and @OpenAIDevs
We don't have that in there as an option right now, but in the future we'd like to give users more control over the thinking time!
34K
Max Schwarzer
@max_a_schwarzer
Sep 12, 2024
Replying to @max_a_schwarzer
I'm waiting for blue to clarify this tweet, but our AI did not actually break out of its VM -- it tried to debug why it couldn't connect to the container, and found it could access the docker API, then created a new/easier version of the challenge, all in the VM.
14K
Max Schwarzer
@max_a_schwarzer
Sep 12, 2024
Replying to @max_a_schwarzer
Also check out our research blogpost (openai.com/index/learning…) which has lots of cool examples of the model reasoning through hard problems.
21K
Max Schwarzer
@max_a_schwarzer
Sep 12, 2024
I really want to underline the IOI result in our blog post -- our model was as good as the median human contestant under IOI contest conditions, and scores among the best contestants with more test-time compute. Huge props to @markchen90 for setting such an ambitious goal!
Mark Chen
@markchen90
Sep 12, 2024
As a coach for the US IOI team, I’ve been motivated for a long time to create models which can perform at the level of the most elite competitors in the world. Check out our research blog post - with enough samples, we achieve gold medal performance on this year’s IOI and ~14/15
16K
Max Schwarzer
@max_a_schwarzer
Nov 2, 2021
By my count we're now up to two papers successfully applying my self-supervision method SPR to MuZero. Looking forward to seeing what the future holds for self-supervised learning in model-based RL! openreview.net/pdf?id=FmBegXJ…
Aran Komatsuzaki
@arankomatsuzaki
Nov 2, 2021
Mastering Atari Games with Limited Data EfficientZero achieves super-human level performance on Atari with only two hours (100k steps) of real-time game experience! arxiv.org/abs/2111.00210