The Tenzai Research Blog

Inside the Top 1%: Engineering Tenzai’s AI Hacker to Compete with Elite Humans

Across six platforms, Tenzai's AI hacker achieved scores placing it within the top 1% of participants, outperforming more than 125,000 human competitors.

Test In Prod Or Live A Lie

Bottom line: You cannot secure modern applications by reviewing code alone. Many vulnerabilities only emerge in production systems - in the interactions between services, identity boundaries, cloud configurations, and in runtime behavior under pressure and focused attacks. At Tenzai, we focus on active validation, testing real systems in realistic environments

When “We Already Passed the Pentest” Isn’t Enough

Internal applications are dangerous precisely because they’re trusted by default. Even strong security programs have blind spots - and AI changes what’s possible to see.

Bad Vibes: Comparing the Secure Coding Capabilities of Popular Coding Agents

A security benchmark of popular AI coding agents—Cursor, Claude Code, Codex, Replit, and Devin—found 69 vulnerabilities across 15 apps. Every agent shipped vulnerable code: broken auth, SSRF, missing controls, and more. Here’s what broke—and why it matters.

Writings from Tenzai researchers on autonomous offense and hard security vulnerabilities.

Latest

Inside the Top 1%: Engineering Tenzai’s AI Hacker to Compete with Elite Humans

Test In Prod Or Live A Lie

When “We Already Passed the Pentest” Isn’t Enough

Bad Vibes: Comparing the Secure Coding Capabilities of Popular Coding Agents