Log inSign up
Rabdos_AI
29 posts
Image
user avatar
Rabdos_AI
@Rabdos_AI
Cartographers of the jagged frontier of mathematics and AI and more... A Math-AI startup company founded by academics & grounded in research.
Philadelphia PA
rabdos.ai
Joined April 2026
66
Following
138
Followers
  • Pinned
    user avatar
    Rabdos_AI
    @Rabdos_AI
    Apr 30
    Static math benchmarks saturate. We built one that doesn't. Announcing MathDuels, the first self-play math benchmark. Every frontier LLM writes problems for the others, and is graded on the ones written for it. As models improve, so does the benchmark.
    Image
    00:00
    22K
  • user avatar
    Rabdos_AI
    @Rabdos_AI
    May 22
    THE SYMPOSIUM PUZZLE: The final dinner of the symposium was less a banquet than a convergence theorem that had failed to be uniform. Five luminaries -- Hardy, Poincaré, von Neumann, Gödel, and Ramanujan -- sat in a row at the head table, each in a different jacket, each with a
    Image
    Image
    Image
    Made with AI
    1.7K
    user avatar
    Rabdos_AI
    @Rabdos_AI
    May 22
    Replying to @Rabdos_AI
    The Symposium Riddle has multiple valid solutions. We gave it to six frontier thinking models. Five of the six confidently reported a single answer — three said Gödel, two said Poincaré. None of those five noticed the puzzle was underdetermined. They each found a solution,
    Image
    Image
    86
    user avatar
    Rabdos_AI
    @Rabdos_AI
    May 22
    Full report -- all five solutions, every model's reasoning trace, the constraint encoding, and a one-clue modification that collapses the puzzle to zero solutions, at the Rabdology blog: rabdology.ai/process-of-eli…
    Image
    Image
    Image
    84
  • user avatar
    Rabdos_AI
    @Rabdos_AI
    May 20
    MathDuels update: Gemini-3.5-Flash ranking #4, surpassing GPT-5.2 & Claude-Opus-4.7 Impressive performance for its speed & size.
    Image
    408
  • Rabdos_AI reposted
    user avatar
    prof-g
    @prof_g
    May 18
    nice math problem i came up with last december... Consider a cube of side length 2, aligned with the coordinate axes. Place three cylinders inside it, each of height 2 and radius R, each aligned with some coordinate axis. The cylinders may not intersect. What is the maximal R?
    Image
    210K
  • user avatar
    Rabdos_AI
    @Rabdos_AI
    May 13
    We are delighted to unveil our research blog Rabdology at rabdology.ai, where we chart the jagged math-frontier of AI reasoning. This is our first post in a weekly series. Read on, and if you enjoy it, please subscribe! (Link at bottom of blog's main page.) The
    Image
    Image
    1.6K
  • Rabdos_AI reposted
    user avatar
    Mayur Naik
    @AI4Code
    May 12
    Very timely, especially in light of revelation that 1/3rd of problems in FrontierMath are fatally flawed. As expert human validation of frontier math tasks approaches its inevitable limit, LLMs are stepping in to fill the void. But our work below shows that the discovery of
    user avatar
    guru
    @guruprerana
    May 12
    Do we need frontier models to verify math proofs? EpochAI just announced that they found several fatal flaws in their FrontierMath benchmark using GPT-5.5. But isn't verification supposed to be easier than generation, so why were they not spotted earlier? In our recent work, we
    Plot comparing open-source models with frontier-models on proof verification.
    1.8K
  • user avatar
    Rabdos_AI
    @Rabdos_AI
    May 1
    In MathDuels leaderboard, Gemini-3-Flash dominates at its size: the only models ahead of it are the latest, largest frontier releases. Gemma-4-31b-it has the highest author rating of any opensource model. Thanks @GoogleDeepMind for these smart little models 🥹
    user avatar
    Rabdos_AI
    @Rabdos_AI
    Apr 30
    Static math benchmarks saturate. We built one that doesn't. Announcing MathDuels, the first self-play math benchmark. Every frontier LLM writes problems for the others, and is graded on the ones written for it. As models improve, so does the benchmark.
    Image
    00:00
    1.4K
  • user avatar
    Rabdos_AI
    @Rabdos_AI
    Apr 30
    Replying to @Rabdos_AI
    Another observation: thinking effort matters less for Authoring than Solving. Dropping Gemini-3.1-Pro to low thinking moves its Solve Rating rank from 2nd to 19th, while its Author Rating rank drops from 2nd to 10th. The same pattern holds for GPT-5.4 and Gemini-3-Flash.
    723
    user avatar
    Rabdos_AI
    @Rabdos_AI
    Apr 30
    MathDuels aims to make math evaluation less dependent on a fixed pool of human-written problems. It will keep evolving at mathduels.ai as new models arrive. Work done by researchers at University of Pennsylvania and @Rabdos_AI, a startup charting the mathematical
    675
  • user avatar
    Rabdos_AI
    @Rabdos_AI
    Apr 20
    a "rabdos" is a rod or wand... comes from the Greek; used by Napier as a computational tool
    Image
    149
  • user avatar
    Rabdos_AI
    @Rabdos_AI
    Apr 16
    image models struggle with local-to-global features... (grok imagine)
    Image
    123

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement