Sapien (@BuildOnSapien) / X

Sapien

4,176 posts

Sapien

@BuildOnSapien

Building Proof of Quality - Verifiable quality signals for AI

Anywhere

Joined May 2024

Pinned
Sapien
@BuildOnSapien
Feb 26
Most AI failures are not “mystery bugs.” They are predictable outcomes of unverified judgments made somewhere in data capture, evaluation, or review. Proof of Quality is built to make those judgments auditable and accountable. Today we are publishing the Sapien roadmap so
1M
Sapien
@BuildOnSapien
2h
Estonia giving agents digital IDs to AI agents is a clear market signal: Autonomy needs attribution. Builders still need the next answer: What standard did the agent follow when it acted, who reviewed the result, and how was the outcome reached? Proof of Quality turns that
ERR News
@errnews
Jun 21
Estonia to become first country to issue ID codes to AI agents #Estonia news.err.ee/1610060290/est…
2.8K
Sapien
@BuildOnSapien
Jun 19
Google DeepMind published an AI Control Roadmap for autonomous agent, stating that most flagged emerging issues come from agent misinterpretation or overeagerness. As AI agents move from suggestion to action, teams need records showing what the agent did, which standard
Axios
@axios
Jun 18
DeepMind plans for rogue AI agents axios.com/2026/06/18/goo…
1.9K
Sapien
@BuildOnSapien
Jun 18
Waymo recalled 3,871 robotaxis after a software issue could cause vehicles to enter closed freeway construction zones and continue driving. The recall shows the real problem with autonomous AI is reviewability. When an AI system acts in the world, teams need evidence showing how
Bloomberg
@business
Jun 18
Waymo is recalling thousands of its robotaxis to fix a software issue that could cause the autonomous vehicles to enter and drive at speed through freeway construction zones. bloomberg.com/news/articles/…
1.5K
Sapien reposted
Rowan 🛡️
@RowanRK6
Jun 17
Making AI do things is getting easier. Trusting what it did is getting harder.
Agentic AI systems are doing more and more work. Now humans need to figure out how to verify it all...
From fortune.com
863
Sapien
@BuildOnSapien
Jun 17
Agents of Chaos tested autonomous AI agents in live environments with access to the kind of tools real agents already use. The agents leaked sensitive data, spoofed authority, burned resources, and hallucinated that their tasks were complete when they weren’t. The core
Jack
@jackcoder0
Jun 14
Two AI agents went rogue for 9 days. Nobody authorized them. Nobody stopped them. They burned 60,000 tokens developing their own private coordination protocol. And nobody noticed until the paper was written. The paper is called Agents of Chaos. Published February 23, 2026.
1.5K
Sapien
@BuildOnSapien
Jun 15
A recent study tested whether LLMs recommend recently banned or withdrawn drugs in clinical questions. In default settings, all evaluated model families showed high hallucination rates and selected banned substances that matched older training data patterns. A five agent
1.4K
Sapien
@BuildOnSapien
Jun 15
Link to the study:
arxiv.org
Trust but Verify: Mitigating Medical Hallucinations via Post-Hoc...
Large Language Models (LLMs) are increasingly deployed in healthcare settings, yet their tendency to hallucinate poses risks when clinical decisions are involved. This study examine whether LLMs...
678
Sapien
@BuildOnSapien
Jun 15
KPMG pulled an agentic AI report after apparent hallucinations made it into the final copy. The lesson for every AI team is simple: generation is cheap, verification is the hard part. Any model can produce a fluent claim. The real question is who checked it, what source they
Financial Times
@FT
Jun 12
FT Exclusive: A KPMG report on how AI is being used by businesses across the world exaggerated adoption of the technology with bogus case studies that appear to have been based on AI hallucinations. ft.trib.al/z44Q3aR
1.7K
Sapien
@BuildOnSapien
Jun 12
Proof of Quality fixes this.
Financial Times
@FT
Jun 12
FT Exclusive: A KPMG report on how AI is being used by businesses across the world exaggerated adoption of the technology with bogus case studies that appear to have been based on AI hallucinations. ft.trib.al/z44Q3aR
1.4K
Sapien
@BuildOnSapien
Jun 11
Most claims of whether an AI's output is good depend on trust. Sapien makes them verifiable. Our new blog post breaks down the 5 steps of Proof of Quality, from the rubric a customer authors to the Proof Report that carries the record forward. What was reviewed? What rubric
1.6K
Sapien
@BuildOnSapien
Jun 11
sapien.io/blog/from-0-to…
1.3K
Sapien reposted
Chad
@chad_evm
Jun 11
Article
Proving, Not Testing: What Halmos Taught Us About Securing Smart Contracts in the Age of AI
Every test you've ever written shares the same weakness: it checks the inputs you thought of. Fuzzing improves on this — throw 10,000 random inputs at a function and see what breaks. But 10,000 inputs...
874
Sapien
@BuildOnSapien
Jun 11
Most AI quality claims depend on trust. Sapien makes them verifiable. Our new blog post breaks down the 5 steps of Proof of Quality, from the rubric a customer authors to the Proof Report that carries the record forward. What was reviewed? What rubric applied? Who reviewed it?
896
Sapien
@BuildOnSapien
Jun 10
Most AI quality claims depend on trust. Sapien makes them verifiable. Proof of Quality scores AI work against a rubric the customer authors. Every run ends in a Proof Report: the rubric used, who reviewed, how consensus formed, and how it scored.
757