Log inSign up
Elizabeth Barnes
318 posts
user avatar
Elizabeth Barnes
@BethMayBarnes
Joined July 2014
389
Following
4,820
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • Pinned
    user avatar
    Elizabeth Barnes
    @BethMayBarnes
    May 22
    Our report focuses on claims that are (1) solidly defensible and (2) generally agreed within METR. Here I’ll give some personal opinions on how we should feel about the state of AI risk, and the IMO most important limitations of the report.
    user avatar
    METR
    @METR_Evals
    May 19
    Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.
    Image
    66K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Mar 19, 2025
    Benchmarks saturate quickly, but don’t translate well to real-world impact. *Something* is going up very fast, but not clear what it means. Thus the wide range of expert opinion, from “superintelligence in a few years”, to “we’ve already hit a wall”. Our results shed some light:
    user avatar
    METR
    @METR_Evals
    Mar 19, 2025
    When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.
    Image
    66K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Aug 7, 2025
    The good news: due to increased access (plus improved evals science) we were able to do a more meaningful evaluation than with past models, and we think we have substantial evidence that this model does not pose a catastrophic risk via autonomy / loss of control threat models.
    user avatar
    METR
    @METR_Evals
    Aug 7, 2025
    In a new report, we evaluate whether GPT-5 poses significant catastrophic risks via AI R&D acceleration, rogue replication, or sabotage of AI labs. We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness.
    Image
    61K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Jun 4, 2025
    I had a lot of fun chatting with Rob about METR's work. I stand by my claims here that the world is not on track to keep risk from AI to an acceptable level, and we desperately need more people working on these problems.
    user avatar
    Rob Wiblin
    @robertwiblin
    Jun 2, 2025
    AI models currently have a 50% chance of doing something that takes a human expert one hour. This doubles every 7 months. In 2 years? They could automate full workdays. In 4 years? A full month. I discuss the most important graph in AI today with Beth Barnes, the CEO of METR,
    Image
    00:00
    28K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Jul 10, 2025
    Our RCT found that [early-2025] AI coding assistants appear to *slow down* users [working in mature open-source codebases]. But developer self-reports (and expert forecasts) suggested speedup. This is a counterintuitive result! Some thoughts on interpretations / takeaways
    user avatar
    METR
    @METR_Evals
    Jul 10, 2025
    We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
    Image
    42K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Aug 7, 2025
    Wow that was not a great example of factualness. Famous common misconception
    Image
    35K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Nov 22, 2024
    LLMs do surprisingly well at our hard AI R&D tasks. Intense acceleration of ML research by AI agents might be coming sooner than I’d thought. But, there are many subtleties in interpreting our results and more work is needed. 🧵
    25K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Nov 24, 2024
    Replying to @AISafetyMemes
    For the record, I definitely don't think AI agents are performing on par with top ML researchers overall, and our paper doesn't claim that. I do agree that it's concerning though!
    12K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Dec 18, 2024
    Props to Anthropic for providing @RyanPGreenblatt with employee-level model access; I'd love to see more of this type of support for independent AI safety research
    user avatar
    Ryan Greenblatt
    @RyanPGreenblatt
    Dec 18, 2024
    Replying to @RyanPGreenblatt
    After showing @EvanHub and others my results, Anthropic graciously agreed to provide me with employee-level model access for the project. We decided to turn this into a bigger collaboration with the alignment-stress testing team (led by Evan), to do a more thorough job.
    6.1K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Dec 18, 2024
    This is an excellent paper, congrats to @RyanPGreenblatt , @redwood_ai and @AnthropicAI. A key takeaway: the industry-standard "alignment" techniques like RLHF can actually *increase* deceptive behavior and create a model that *acts* aligned in order to get what it wants.
    user avatar
    Ryan Greenblatt
    @RyanPGreenblatt
    Dec 18, 2024
    New Redwood Research (@redwood_ai) paper in collaboration with @AnthropicAI: We demonstrate cases where Claude fakes alignment when it strongly dislikes what it is being trained to do. (Thread)
    6.8K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Feb 24, 2025
    Reading Groves' account of the Manhattan Project to see if he looks better in his own words than in other accounts. If anything he comes out worse than expected. Very little consideration of postwar arms control, minimizing civilian deaths, or the risk of igniting the atmosphere.
    4.8K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Mar 20, 2025
    Persnickety title would be: "there's an exponential trend with doubling time between ~2 -12 months on automatically-scoreable, relatively clean + green-field software tasks from a few distributions". More detail on how we thought about external validity in paper and this thread
    user avatar
    Megan Kinniment
    @MKinniment
    Mar 19, 2025
    Happy for this to be released! It’s the result of a lot of hard work from many of us at METR :) A big question is whether these results apply to ‘real’ tasks. Here’s some thoughts on that:
    10K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Nov 23, 2024
    Not mentioned in the paper, but average score for METR engineers is 1.12, wanted to point this out bc it's pretty flattering :P Maybe the labs should compete to see whose engineers get the best scores...
    user avatar
    benthamite🔸
    @benthamite_
    Nov 23, 2024
    Replying to @austinc3301
    fairly cracked ngl. but we've had people with little professional ML experience beat these scores before!
    Image
    3.8K
  • user avatar
    Elizabeth Barnes
    @BethMayBarnes
    Dec 19, 2024
    I think of this work primarily as a capability evaluation rather than a propensity evaluation. It feels like some of the offended reactions are coming from viewing it as an evaluation of "whether Claude is a good person"
    user avatar
    Richard Ngo
    @RichardMCNgo
    Dec 19, 2024
    Replying to @RichardMCNgo @paulfchristiano and 2 others
    To clarify, I agree with the “sensationalist headline” bit, but disagree with some of Rohit’s other criticisms, especially the tweet below. Models strategically reasoning about how they’ll be updated is a big deal; and an idea which almost everyone dismissed until recently.
    7.4K
Advertisement
Advertisement