Elizabeth Barnes (@BethMayBarnes) / X

Elizabeth Barnes

318 posts

Elizabeth Barnes

@BethMayBarnes

Joined July 2014

Pinned
Elizabeth Barnes
@BethMayBarnes
May 22
Our report focuses on claims that are (1) solidly defensible and (2) generally agreed within METR. Here I’ll give some personal opinions on how we should feel about the state of AI risk, and the IMO most important limitations of the report.
METR
@METR_Evals
May 19
Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.
66K
Elizabeth Barnes
@BethMayBarnes
Mar 19, 2025
Benchmarks saturate quickly, but don’t translate well to real-world impact. *Something* is going up very fast, but not clear what it means. Thus the wide range of expert opinion, from “superintelligence in a few years”, to “we’ve already hit a wall”. Our results shed some light:
METR
@METR_Evals
Mar 19, 2025
When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.
66K
Elizabeth Barnes
@BethMayBarnes
Aug 7, 2025
The good news: due to increased access (plus improved evals science) we were able to do a more meaningful evaluation than with past models, and we think we have substantial evidence that this model does not pose a catastrophic risk via autonomy / loss of control threat models.
METR
@METR_Evals
Aug 7, 2025
In a new report, we evaluate whether GPT-5 poses significant catastrophic risks via AI R&D acceleration, rogue replication, or sabotage of AI labs. We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness.
61K
Elizabeth Barnes
@BethMayBarnes
Jun 4, 2025
I had a lot of fun chatting with Rob about METR's work. I stand by my claims here that the world is not on track to keep risk from AI to an acceptable level, and we desperately need more people working on these problems.
Rob Wiblin
@robertwiblin
Jun 2, 2025
AI models currently have a 50% chance of doing something that takes a human expert one hour. This doubles every 7 months. In 2 years? They could automate full workdays. In 4 years? A full month. I discuss the most important graph in AI today with Beth Barnes, the CEO of METR,
00:00
28K
Elizabeth Barnes
@BethMayBarnes
Jul 10, 2025
Our RCT found that [early-2025] AI coding assistants appear to *slow down* users [working in mature open-source codebases]. But developer self-reports (and expert forecasts) suggested speedup. This is a counterintuitive result! Some thoughts on interpretations / takeaways
METR
@METR_Evals
Jul 10, 2025
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
42K
Elizabeth Barnes
@BethMayBarnes
Aug 7, 2025
Wow that was not a great example of factualness. Famous common misconception
35K
Elizabeth Barnes
@BethMayBarnes
Nov 22, 2024
LLMs do surprisingly well at our hard AI R&D tasks. Intense acceleration of ML research by AI agents might be coming sooner than I’d thought. But, there are many subtleties in interpreting our results and more work is needed. 🧵
25K
Elizabeth Barnes
@BethMayBarnes
Nov 24, 2024
Replying to @AISafetyMemes
For the record, I definitely don't think AI agents are performing on par with top ML researchers overall, and our paper doesn't claim that. I do agree that it's concerning though!
12K
Elizabeth Barnes
@BethMayBarnes
Dec 18, 2024
Props to Anthropic for providing @RyanPGreenblatt with employee-level model access; I'd love to see more of this type of support for independent AI safety research
Ryan Greenblatt
@RyanPGreenblatt
Dec 18, 2024
Replying to @RyanPGreenblatt
After showing @EvanHub and others my results, Anthropic graciously agreed to provide me with employee-level model access for the project. We decided to turn this into a bigger collaboration with the alignment-stress testing team (led by Evan), to do a more thorough job.
6.1K
Elizabeth Barnes
@BethMayBarnes
Dec 18, 2024
This is an excellent paper, congrats to @RyanPGreenblatt , @redwood_ai and @AnthropicAI. A key takeaway: the industry-standard "alignment" techniques like RLHF can actually *increase* deceptive behavior and create a model that *acts* aligned in order to get what it wants.
Ryan Greenblatt
@RyanPGreenblatt
Dec 18, 2024
New Redwood Research (@redwood_ai) paper in collaboration with @AnthropicAI: We demonstrate cases where Claude fakes alignment when it strongly dislikes what it is being trained to do. (Thread)
6.8K
Elizabeth Barnes
@BethMayBarnes
Feb 24, 2025
Reading Groves' account of the Manhattan Project to see if he looks better in his own words than in other accounts. If anything he comes out worse than expected. Very little consideration of postwar arms control, minimizing civilian deaths, or the risk of igniting the atmosphere.
4.8K
Elizabeth Barnes
@BethMayBarnes
Mar 20, 2025
Persnickety title would be: "there's an exponential trend with doubling time between ~2 -12 months on automatically-scoreable, relatively clean + green-field software tasks from a few distributions". More detail on how we thought about external validity in paper and this thread
Megan Kinniment
@MKinniment
Mar 19, 2025
Happy for this to be released! It’s the result of a lot of hard work from many of us at METR :) A big question is whether these results apply to ‘real’ tasks. Here’s some thoughts on that:
10K
Elizabeth Barnes
@BethMayBarnes
Nov 23, 2024
Not mentioned in the paper, but average score for METR engineers is 1.12, wanted to point this out bc it's pretty flattering :P Maybe the labs should compete to see whose engineers get the best scores...
benthamite🔸
@benthamite_
Nov 23, 2024
Replying to @austinc3301
fairly cracked ngl. but we've had people with little professional ML experience beat these scores before!
3.8K
Elizabeth Barnes
@BethMayBarnes
Dec 19, 2024
I think of this work primarily as a capability evaluation rather than a propensity evaluation. It feels like some of the offended reactions are coming from viewing it as an evaluation of "whether Claude is a good person"
Richard Ngo
@RichardMCNgo
Dec 19, 2024
Replying to @RichardMCNgo @paulfchristiano and 2 others
To clarify, I agree with the “sensationalist headline” bit, but disagree with some of Rohit’s other criticisms, especially the tweet below. Models strategically reasoning about how they’ll be updated is a big deal; and an idea which almost everyone dismissed until recently.
7.4K