Rabdos_AI (@Rabdos

Rabdos_AI

29 posts

Rabdos_AI

@Rabdos_AI

Cartographers of the jagged frontier of mathematics and AI and more... A Math-AI startup company founded by academics & grounded in research.

Philadelphia PA

Joined April 2026

Pinned
Rabdos_AI
@Rabdos_AI
Apr 30
Static math benchmarks saturate. We built one that doesn't. Announcing MathDuels, the first self-play math benchmark. Every frontier LLM writes problems for the others, and is graded on the ones written for it. As models improve, so does the benchmark.
00:00
22K
Rabdos_AI
@Rabdos_AI
May 22
THE SYMPOSIUM PUZZLE: The final dinner of the symposium was less a banquet than a convergence theorem that had failed to be uniform. Five luminaries -- Hardy, Poincaré, von Neumann, Gödel, and Ramanujan -- sat in a row at the head table, each in a different jacket, each with a
Made with AI
1.7K
Rabdos_AI
@Rabdos_AI
May 22
Replying to @Rabdos_AI
The Symposium Riddle has multiple valid solutions. We gave it to six frontier thinking models. Five of the six confidently reported a single answer — three said Gödel, two said Poincaré. None of those five noticed the puzzle was underdetermined. They each found a solution,
86
Rabdos_AI
@Rabdos_AI
May 22
Full report -- all five solutions, every model's reasoning trace, the constraint encoding, and a one-clue modification that collapses the puzzle to zero solutions, at the Rabdology blog: rabdology.ai/process-of-eli…
84
Rabdos_AI
@Rabdos_AI
May 20
MathDuels update: Gemini-3.5-Flash ranking #4, surpassing GPT-5.2 & Claude-Opus-4.7 Impressive performance for its speed & size.
408
Rabdos_AI reposted
prof-g
@prof_g
May 18
nice math problem i came up with last december... Consider a cube of side length 2, aligned with the coordinate axes. Place three cylinders inside it, each of height 2 and radius R, each aligned with some coordinate axis. The cylinders may not intersect. What is the maximal R?
210K
Rabdos_AI
@Rabdos_AI
May 13
We are delighted to unveil our research blog Rabdology at rabdology.ai, where we chart the jagged math-frontier of AI reasoning. This is our first post in a weekly series. Read on, and if you enjoy it, please subscribe! (Link at bottom of blog's main page.) The
1.6K
Rabdos_AI reposted
Mayur Naik
@AI4Code
May 12
Very timely, especially in light of revelation that 1/3rd of problems in FrontierMath are fatally flawed. As expert human validation of frontier math tasks approaches its inevitable limit, LLMs are stepping in to fill the void. But our work below shows that the discovery of
guru
@guruprerana
May 12
Do we need frontier models to verify math proofs? EpochAI just announced that they found several fatal flaws in their FrontierMath benchmark using GPT-5.5. But isn't verification supposed to be easier than generation, so why were they not spotted earlier? In our recent work, we
1.8K
Rabdos_AI
@Rabdos_AI
May 1
In MathDuels leaderboard, Gemini-3-Flash dominates at its size: the only models ahead of it are the latest, largest frontier releases. Gemma-4-31b-it has the highest author rating of any opensource model. Thanks @GoogleDeepMind for these smart little models 🥹
Rabdos_AI
@Rabdos_AI
Apr 30
Static math benchmarks saturate. We built one that doesn't. Announcing MathDuels, the first self-play math benchmark. Every frontier LLM writes problems for the others, and is graded on the ones written for it. As models improve, so does the benchmark.
00:00
1.4K
Rabdos_AI
@Rabdos_AI
Apr 30
Replying to @Rabdos_AI
Another observation: thinking effort matters less for Authoring than Solving. Dropping Gemini-3.1-Pro to low thinking moves its Solve Rating rank from 2nd to 19th, while its Author Rating rank drops from 2nd to 10th. The same pattern holds for GPT-5.4 and Gemini-3-Flash.
723
Rabdos_AI
@Rabdos_AI
Apr 30
MathDuels aims to make math evaluation less dependent on a fixed pool of human-written problems. It will keep evolving at mathduels.ai as new models arrive. Work done by researchers at University of Pennsylvania and @Rabdos_AI, a startup charting the mathematical
675
Rabdos_AI
@Rabdos_AI
Apr 20
a "rabdos" is a rod or wand... comes from the Greek; used by Napier as a computational tool
149
Rabdos_AI
@Rabdos_AI
Apr 16
image models struggle with local-to-global features... (grok imagine)
123