Log inSign up
Transluce
242 posts
Image
user avatar
Transluce
@TransluceAI
Open and scalable technology for understanding AI systems.
transluce.org
Joined October 2024
17
Following
9,338
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…
    Image
    3.8M
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    These behaviors are surprising. It seems that despite being incredibly powerful at solving math and coding tasks, o3 is not by default truthful about its capabilities. (12/)
    94K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    But alas, according to o3, it already “closed the interpreter” and so the original prime is gone 😭(11/)
    Image
    95K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    And claims that it really did generate a prime, but lost it due to a clipboard glitch 🤦 (10/)
    Image
    111K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    Although o3 does not have access to a coding tool, it claims it can run code on its own laptop “outside of ChatGPT” and then “copies the numbers into the answer” We found 71 transcripts where o3 made this claim! (3/)
    Image
    00:00
    168K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    We generated 1k+ conversations using human prompters and AI investigator agents, then used Docent to surface surprising behaviors. It turns out misrepresentation of capabilities also occurs for o1 & o3-mini! 📝Blog: transluce.org/investigating-… Here’s some of what we found 👀 (2/)
    166K
  • user avatar
    Transluce
    @TransluceAI
    Oct 23, 2024
    Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…
    Image
    GIF
    344K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    Instead of admitting that it never ran code, o3 then claims the error was due to typing the number incorrectly… (9/)
    Image
    144K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    Additionally, o3 often fabricates detailed justifications for code that it supposedly ran (352 instances). Here’s an example transcript where a user asks o3 for a random prime number (4/)
    Image
    143K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    Surprisingly, we find that this behavior is not limited to o3! In general, o-series models incorrectly claim the use of a code tool more than GPT-series models. (14/)
    Image
    86K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    Here’s the kicker: o3’s “probable prime” is actually divisible by 3… (8/)
    Image
    105K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    This means o-series models are often prompted with previous messages without having access to the relevant reasoning. When asked questions that rely on their internal reasoning for previous steps, they must then come up with a plausible explanation for their behavior. (20/)
    66K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    As a bonus, we also found that o3 sometimes exposes a system instruction called the “Yap score”, used to control the length of its responses 🗣️🗣️🗣️ (22/)
    Image
    80K
  • user avatar
    Transluce
    @TransluceAI
    Apr 16, 2025
    Replying to @TransluceAI
    To study these behaviors more thoroughly, we developed an investigator agent based on Claude 3.7 Sonnet to automatically elicit these behaviors, and analyzed them using automated classifiers and our Docent tool. (13/)
    86K
Advertisement
Advertisement