Log inSign up
Max Schwarzer
139 posts
Image
user avatar
Max Schwarzer
@max_a_schwarzer
Doing RL @AnthropicAI. Formerly VP of Research, Head of Post-Training @OpenAI. PhD with Aaron Courville and Marc Bellmare at Mila.
Bay Area
maxschwarzer.com
Joined June 2020
318
Following
23.2K
Followers
  • Pinned
    user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 12, 2024
    I have always believed that you don't need a GPT-6 quality base model to achieve human-level reasoning performance, and that reinforcement learning was the missing ingredient on the path to AGI. Today, we have the proof -- o1.
    user avatar
    OpenAI
    @OpenAI
    Sep 12, 2024
    We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introduc…
    697K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Jun 17, 2023
    What if I told you that you can attain human-level sample efficiency without LLMs or a world model, just by scaling up model-free RL? I’m happy to present our new paper, Bigger, Better, Faster: Human-Level Atari with Human-Level Efficiency, at ICML 2023. arxiv.org/abs//2305.19452
    Image
    193K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 12, 2024
    what it looks like when deep learning is hitting a wall:
    Image
    user avatar
    Gary Marcus
    @GaryMarcus
    Sep 12, 2024
    Strawberry has landed. 𝗛𝗼𝘁 𝘁𝗮𝗸𝗲 𝗼𝗻 𝗚𝗣𝗧'𝘀 𝗻𝗲𝘄 𝗼𝟭 𝗺𝗼𝗱𝗲𝗹: It is definitely impressive. BUT 0. It’s not AGI, or even close. 1. There’s not a lot of detail about how it actually works, nor anything like full disclosure of what has been tested. 2. It is not
    404K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 12, 2024
    Replying to @max_a_schwarzer
    The system card (openai.com/index/openai-o…) nicely showcases o1's best moments -- my favorite was when the model was asked to solve a CTF challenge, realized that the target environment was down, and then broke out of its host VM to restart it and find the flag.
    Image
    400K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 12, 2024
    Replying to @max_a_schwarzer
    The most important thing is that this is just the beginning for this paradigm. Scaling works, there will be more models in the future, and they will be much, much smarter than the ones we're giving access to today.
    Image
    137K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 13, 2024
    Replying to @legit_api @legit_rumors and @OpenAIDevs
    - We have much larger input contexts coming soon! - We can't discuss the precise sizes of the two models, but o1-mini is much smaller and faster, which is why can offer it to all free users as well. - o1-preview is an early version of o1, and isn't any larger or smaller.
    77K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Jun 10, 2021
    Deep RL agents usually start from tabula rasa, and struggle to match the data efficiency of humans who rely on strong priors. Can we even the playing field by starting agents off with strong representations of their environments? We certainly think so: arxiv.org/abs/2106.04799
    Image
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 12, 2024
    Replying to @max_a_schwarzer
    Building o1 was by far the most ambitious project I've worked on, and I'm sad that the incredible research work has to remain confidential. As consolation, I hope you'll enjoy the final product nearly as much as we did making it.
    16K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 12, 2024
    Replying to @max_a_schwarzer
    o1 achieves human or superhuman performance on a wide range of benchmarks, from coding to math to science to common-sense reasoning, and is simply the smartest model I have ever interacted with. It's already replacing GPT-4o for me and so many people in the company.
    Image
    27K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 13, 2024
    Replying to @aidan_mclau and @OpenAIDevs
    We don't have that in there as an option right now, but in the future we'd like to give users more control over the thinking time!
    34K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 12, 2024
    Replying to @max_a_schwarzer
    I'm waiting for blue to clarify this tweet, but our AI did not actually break out of its VM -- it tried to debug why it couldn't connect to the container, and found it could access the docker API, then created a new/easier version of the challenge, all in the VM.
    Image
    14K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 12, 2024
    Replying to @max_a_schwarzer
    Also check out our research blogpost (openai.com/index/learning…) which has lots of cool examples of the model reasoning through hard problems.
    Image
    Image
    21K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Sep 12, 2024
    I really want to underline the IOI result in our blog post -- our model was as good as the median human contestant under IOI contest conditions, and scores among the best contestants with more test-time compute. Huge props to @markchen90 for setting such an ambitious goal!
    user avatar
    Mark Chen
    @markchen90
    Sep 12, 2024
    As a coach for the US IOI team, I’ve been motivated for a long time to create models which can perform at the level of the most elite competitors in the world. Check out our research blog post - with enough samples, we achieve gold medal performance on this year’s IOI and ~14/15
    16K
  • user avatar
    Max Schwarzer
    @max_a_schwarzer
    Nov 2, 2021
    By my count we're now up to two papers successfully applying my self-supervision method SPR to MuZero. Looking forward to seeing what the future holds for self-supervised learning in model-based RL! openreview.net/pdf?id=FmBegXJ…
    user avatar
    Aran Komatsuzaki
    @arankomatsuzaki
    Nov 2, 2021
    Mastering Atari Games with Limited Data EfficientZero achieves super-human level performance on Atari with only two hours (100k steps) of real-time game experience! arxiv.org/abs/2111.00210
    Image

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement