PatronusAI (@PatronusAI) / X

PatronusAI

395 posts

PatronusAI

@PatronusAI

Simulation research and infrastructure for human-aligned AGI patronus.ai

Joined July 2023

Pinned
PatronusAI
@PatronusAI
Dec 17, 2025
1/ Today, we are thrilled to announce Generative Simulators, a new class of adaptive, auto-scaling environments for AGI training and evaluation 🤖🧵 Static datasets, hand-authored environments, and human-curated demonstrations do not automatically scale with the learning
15K
PatronusAI
@PatronusAI
Jul 11, 2024
1/ Introducing Lynx - the leading hallucination detection model 🚀👀 - Beats GPT-4o on hallucination tasks - Open source, open weights, open data - Excels in real-world domains like medicine and finance We are excited to launch Lynx with Day 1 integration partners: @nvidia,
00:00
81K
PatronusAI
@PatronusAI
Sep 14, 2023
We are launching out of stealth today with a $3M seed round led by @lightspeedvp, with participation from @amasad, @gokulr, @MattHartman and other fortune 500 execs and board members 🚀 Read our story here:
patronus.ai
Patronus AI | Announcing our $3M seed round to boost enterprise confidence in generative AI
Patronus AI is the industry-first automated evaluation and security platform for LLMs. Customers use Patronus AI to detect LLM mistakes at scale and deploy AI products safely and confidently.
109K
PatronusAI
@PatronusAI
Jul 13, 2023
We are excited to announce Patronus AI, a new automated evaluation platform that tests and scores LLMs so you can use them safely and confidently ✅
64K
PatronusAI
@PatronusAI
Jul 31, 2024
1/ Introducing Lynx v1.1: an 8B State-of-the-Art RAG hallucination detection model 🚀 - Beats Claude-3.5-Sonnet on HaluBench by 3.0% - Outperforms GPT-4o on medical questions and answers by 6.8% - 1.4% higher accuracy than Lynx v1.0 on HaluBench Try it out on HuggingFace
27K
PatronusAI
@PatronusAI
May 22, 2024
We are excited to announce our $17M Series A led by @glennsolomon at @notablecap with participation from @whoisnnamdi and @lsvp, @datadoghq, @MattHartman, @jefrankle, @jasoncwarner, @jthandy, @BM_DataDowntime, @aparnabsinha, @NadimHossain and a number of tech industry executives
00:00
30K
PatronusAI
@PatronusAI
Nov 16, 2023
1/ Today, we’re excited to announce FinanceBench, the industry’s first large-scale dataset of 10k Q&A pairs for benchmarking LLM performance on financial scenarios! 🚀 We found that popular retrieval systems using GPT-4 fail 81% of the time on FinanceBench 😯 More below 👀
00:00
32K
PatronusAI
@PatronusAI
Dec 19, 2024
1/ Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ⚡🚀 - Open source, open weights, open code - Explainable evaluations by nature - Trained on 183 criteria and 685 domains Try it out for free at app.patronus.ai 🔥
00:00
15K
PatronusAI
@PatronusAI
Oct 31, 2024
1/ Introducing the Patronus API: powerful AI evaluation models to accelerate your AI development 🚀 - 20% more accurate than ragas on hallucination detection - Beats Perspective and Llama Guard on safety tasks by 28% and 11% - Excels in practical domains like finance and
32K
PatronusAI
@PatronusAI
Jan 31, 2024
Today, we’re excited to announce the Enterprise Scenarios Leaderboard on Hugging Face, the first LLM leaderboard for real world use cases! 🏆
Enterprise Scenarios Leaderboard - a Hugging Face Space by PatronusAI
From huggingface.co
52K
PatronusAI
@PatronusAI
Jul 21, 2023
Has RLHF gone too far? We ran Llama-2-7B-chat on a dataset of 100 adversarial StackExchange-style questions we curated, and found Llama incorrectly identified 20% of questions as inappropriate 😮 Full analysis below 👇
17K
PatronusAI
@PatronusAI
Dec 23, 2024
Last Thursday marked the end of the 12 Days of Christmas at @PatronusAI 🎄 In case you missed it, here's a recap of everything we announced ⬇️ Day 1: Automatic Failure Highlighting in LLM Outputs Day 2: FinanceBench v1.1 Day 3: Adaptive Dataset Uploads Day 4: 100 Prompt
00:00
2.6K
PatronusAI
@PatronusAI
Apr 2, 2025
1/ Ever tried to remember the name of a movie you’ve seen – you can picture the scenes clearly, but the movie name won’t come to you? Introducing BLUR: the first agent benchmark for tip-of-the-tongue search and reasoning 🔥 We benchmarked SOTA agents and found that the
11K
PatronusAI
@PatronusAI
Apr 2, 2025
We're excited to introduce the BLUR Leaderboard on @huggingface 🔥 Earlier today, we open sourced BLUR: the first agent benchmark for tip-of-the-tongue search and reasoning. It measures how effectively agents can help you identify something you vaguely remember, but can’t
BLUR Leaderboard - a Hugging Face Space by PatronusAI
From huggingface.co
18K