Log inSign up
PatronusAI
395 posts
Image
user avatar
PatronusAI
@PatronusAI
Simulation research and infrastructure for human-aligned AGI patronus.ai
patronus.ai
Joined July 2023
216
Following
2,263
Followers
  • Pinned
    user avatar
    PatronusAI
    @PatronusAI
    Dec 17, 2025
    1/ Today, we are thrilled to announce Generative Simulators, a new class of adaptive, auto-scaling environments for AGI training and evaluation 🤖🧵 Static datasets, hand-authored environments, and human-curated demonstrations do not automatically scale with the learning
    Image
    15K
  • user avatar
    PatronusAI
    @PatronusAI
    Jul 11, 2024
    1/ Introducing Lynx - the leading hallucination detection model 🚀👀 - Beats GPT-4o on hallucination tasks - Open source, open weights, open data - Excels in real-world domains like medicine and finance We are excited to launch Lynx with Day 1 integration partners: @nvidia,
    Image
    00:00
    81K
  • user avatar
    PatronusAI
    @PatronusAI
    Sep 14, 2023
    We are launching out of stealth today with a $3M seed round led by @lightspeedvp, with participation from @amasad, @gokulr, @MattHartman and other fortune 500 execs and board members 🚀 Read our story here:
    patronus.ai
    Patronus AI | Announcing our $3M seed round to boost enterprise confidence in generative AI
    Patronus AI is the industry-first automated evaluation and security platform for LLMs. Customers use Patronus AI to detect LLM mistakes at scale and deploy AI products safely and confidently.
    109K
  • user avatar
    PatronusAI
    @PatronusAI
    Jul 13, 2023
    We are excited to announce Patronus AI, a new automated evaluation platform that tests and scores LLMs so you can use them safely and confidently ✅
    64K
  • user avatar
    PatronusAI
    @PatronusAI
    Jul 31, 2024
    1/ Introducing Lynx v1.1: an 8B State-of-the-Art RAG hallucination detection model 🚀 - Beats Claude-3.5-Sonnet on HaluBench by 3.0% - Outperforms GPT-4o on medical questions and answers by 6.8% - 1.4% higher accuracy than Lynx v1.0 on HaluBench Try it out on HuggingFace
    Image
    27K
  • user avatar
    PatronusAI
    @PatronusAI
    May 22, 2024
    We are excited to announce our $17M Series A led by @glennsolomon at @notablecap with participation from @whoisnnamdi and @lsvp, @datadoghq, @MattHartman, @jefrankle, @jasoncwarner, @jthandy, @BM_DataDowntime, @aparnabsinha, @NadimHossain and a number of tech industry executives
    Image
    00:00
    30K
  • user avatar
    PatronusAI
    @PatronusAI
    Nov 16, 2023
    1/ Today, we’re excited to announce FinanceBench, the industry’s first large-scale dataset of 10k Q&A pairs for benchmarking LLM performance on financial scenarios! 🚀 We found that popular retrieval systems using GPT-4 fail 81% of the time on FinanceBench 😯 More below 👀
    Image
    00:00
    32K
  • user avatar
    PatronusAI
    @PatronusAI
    Dec 19, 2024
    1/ Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ⚡🚀 - Open source, open weights, open code - Explainable evaluations by nature - Trained on 183 criteria and 685 domains Try it out for free at app.patronus.ai 🔥
    Image
    00:00
    15K
  • user avatar
    PatronusAI
    @PatronusAI
    Oct 31, 2024
    1/ Introducing the Patronus API: powerful AI evaluation models to accelerate your AI development 🚀 - 20% more accurate than ragas on hallucination detection - Beats Perspective and Llama Guard on safety tasks by 28% and 11% - Excels in practical domains like finance and
    32K
  • user avatar
    PatronusAI
    @PatronusAI
    Jan 31, 2024
    Today, we’re excited to announce the Enterprise Scenarios Leaderboard on Hugging Face, the first LLM leaderboard for real world use cases! 🏆
    Image
    Enterprise Scenarios Leaderboard - a Hugging Face Space by PatronusAI
    From huggingface.co
    52K
  • user avatar
    PatronusAI
    @PatronusAI
    Jul 21, 2023
    Has RLHF gone too far? We ran Llama-2-7B-chat on a dataset of 100 adversarial StackExchange-style questions we curated, and found Llama incorrectly identified 20% of questions as inappropriate 😮 Full analysis below 👇
    Image
    17K
  • user avatar
    PatronusAI
    @PatronusAI
    Dec 23, 2024
    Last Thursday marked the end of the 12 Days of Christmas at @PatronusAI 🎄 In case you missed it, here's a recap of everything we announced ⬇️ Day 1: Automatic Failure Highlighting in LLM Outputs Day 2: FinanceBench v1.1 Day 3: Adaptive Dataset Uploads Day 4: 100 Prompt
    Image
    00:00
    2.6K
  • user avatar
    PatronusAI
    @PatronusAI
    Apr 2, 2025
    1/ Ever tried to remember the name of a movie you’ve seen – you can picture the scenes clearly, but the movie name won’t come to you? Introducing BLUR: the first agent benchmark for tip-of-the-tongue search and reasoning 🔥 We benchmarked SOTA agents and found that the
    Image
    11K
  • user avatar
    PatronusAI
    @PatronusAI
    Apr 2, 2025
    We're excited to introduce the BLUR Leaderboard on @huggingface 🔥 Earlier today, we open sourced BLUR: the first agent benchmark for tip-of-the-tongue search and reasoning. It measures how effectively agents can help you identify something you vaguely remember, but can’t
    Image
    BLUR Leaderboard - a Hugging Face Space by PatronusAI
    From huggingface.co
    18K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement