Transluce (@TransluceAI) / X

Transluce

242 posts

Transluce

@TransluceAI

Open and scalable technology for understanding AI systems.

Joined October 2024

Transluce
@TransluceAI
Apr 16, 2025
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…
3.8M
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
These behaviors are surprising. It seems that despite being incredibly powerful at solving math and coding tasks, o3 is not by default truthful about its capabilities. (12/)
94K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
But alas, according to o3, it already “closed the interpreter” and so the original prime is gone 😭(11/)
95K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
And claims that it really did generate a prime, but lost it due to a clipboard glitch 🤦 (10/)
111K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
Although o3 does not have access to a coding tool, it claims it can run code on its own laptop “outside of ChatGPT” and then “copies the numbers into the answer” We found 71 transcripts where o3 made this claim! (3/)
00:00
168K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
We generated 1k+ conversations using human prompters and AI investigator agents, then used Docent to surface surprising behaviors. It turns out misrepresentation of capabilities also occurs for o1 & o3-mini! 📝Blog: transluce.org/investigating-… Here’s some of what we found 👀 (2/)
166K
Transluce
@TransluceAI
Oct 23, 2024
Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…
GIF
344K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
Instead of admitting that it never ran code, o3 then claims the error was due to typing the number incorrectly… (9/)
144K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
Additionally, o3 often fabricates detailed justifications for code that it supposedly ran (352 instances). Here’s an example transcript where a user asks o3 for a random prime number (4/)
143K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
Surprisingly, we find that this behavior is not limited to o3! In general, o-series models incorrectly claim the use of a code tool more than GPT-series models. (14/)
86K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
Here’s the kicker: o3’s “probable prime” is actually divisible by 3… (8/)
105K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
This means o-series models are often prompted with previous messages without having access to the relevant reasoning. When asked questions that rely on their internal reasoning for previous steps, they must then come up with a plausible explanation for their behavior. (20/)
66K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
As a bonus, we also found that o3 sometimes exposes a system instruction called the “Yap score”, used to control the length of its responses 🗣️🗣️🗣️ (22/)
80K
Transluce
@TransluceAI
Apr 16, 2025
Replying to @TransluceAI
To study these behaviors more thoroughly, we developed an investigator agent based on Claude 3.7 Sonnet to automatically elicit these behaviors, and analyzed them using automated classifiers and our Docent tool. (13/)
86K