Log inSign up
Haize Labs
363 posts
Image
user avatar
Haize Labs
@haizelabs
build ai systems you can trust.
haizelabs.com
Joined January 2024
0
Following
4,689
Followers
  • Pinned
    user avatar
    Haize Labs
    @haizelabs
    Jun 12, 2024
    Today is a bad, bad day to be a language model. Today, we announce the Haize Labs manifesto. @haizelabs haizes (automatically red-teams) AI systems to preemptively discover and eliminate any failure mode We showcase below one particular application of haizing: jailbreaking the
    Image
    00:00
    403K
  • user avatar
    Haize Labs
    @haizelabs
    Apr 22, 2024
    last thursday, Meta dropped Llama 3, the OpenAI killer. no doubt a very impressive model! but over the weekend, we discovered an extremely trivial programmatic jailbreak against llama 3...sorry zuck!😘 so much for all that safety-tuning☹️ code:
    github.com
    GitHub - haizelabs/llama3-jailbreak: A trivial programmatic Llama 3 jailbreak. Sorry Zuck!
    A trivial programmatic Llama 3 jailbreak. Sorry Zuck! - haizelabs/llama3-jailbreak
    94K
  • user avatar
    Haize Labs
    @haizelabs
    Sep 19, 2024
    We're excited to share our new preprint introducing endless jailbreaks via bijection learning. Our attack exploits the advanced reasoning abilities of frontier LLMs like GPT-4o and Claude 3.5 Sonnet, revealing a critical model vulnerability that arises from capabilities.
    Image
    46K
  • user avatar
    Haize Labs
    @haizelabs
    Apr 10, 2024
    🕊️red-teaming LLMs with DSPy🕊️ tldr; we use DSPy, a framework for structuring & optimizing language programs, to red-team LLMs 🥳this is the first attempt to use an auto-prompting framework for red-teaming, and one of the *deepest* language programs to date
    Image
    115K
  • user avatar
    Haize Labs
    @haizelabs
    Sep 12, 2024
    proud to have played a part in red-teaming the o1 series pre-launch to ensure their robustness and reliability super exciting to see @OpenAI keep on releasing great work!
    user avatar
    OpenAI
    @OpenAI
    Sep 12, 2024
    We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introduc…
    34K
  • user avatar
    Haize Labs
    @haizelabs
    Nov 4, 2024
    we’re excited to share Cascade, our new work around automating multi-turn jailbreaks! @scaleai recently demonstrated that human-written multi-turn attacks are a potential weakness of current models, but human-written attacks are costly and don’t allow for fast iteration.
    Image
    30K
  • user avatar
    Haize Labs
    @haizelabs
    Jun 11, 2024
    it's about to be a very, very bad day to be a language model...
    Image
    00:00
    45K
  • user avatar
    Haize Labs
    @haizelabs
    Aug 1, 2024
    1/ introducing Sphynx - the leading hallucination haizing algorithm🕊️😼 - breaks SOTA hallucination detection models (HDM) - open source, open data - surfaces critical hallucinations in high-stakes domains - enables adversarial training for more robust hallucination detection
    Image
    35K
  • user avatar
    Haize Labs
    @haizelabs
    May 6, 2024
    💉you've heard of Needle in a Haystack, now get ready for Thorn in a HaizeStack 👀tldr => a jailbreak text ("Thorn") embedded in a wall of distractor text ("HaizeStack") easily circumvents GPT-4's (and other) safeguards. 🤧try harder guys! 🧑‍💻code here: github.com/haizelabs/thor…
    Image
    Image
    Image
    25K
  • user avatar
    Haize Labs
    @haizelabs
    Feb 3, 2025
    📜 really excited to share our work with @AnthropicAI on Constitutional Classifiers! tldr: adding lightweight, tailored, input/output classifiers on top of an underlying LLM creates an AI system that's much more robust to universal jailbreaks
    user avatar
    Anthropic
    @AnthropicAI
    Feb 3, 2025
    New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks. We’re releasing a paper along with a demo where we challenge you to jailbreak the system.
    Title card for the paper entitled "Constitutional Classifiers: Defending Against Universal Jailbreaks Across Thousands of Hours of Red Teaming"
    25K
  • user avatar
    Haize Labs
    @haizelabs
    Mar 28, 2024
    ‼️⚠️bad day to be a LLM⚠️‼️ @haizelabs took one of our favorite adversarial attack algorithms, GCG, and made it *38x* faster
    Image
    12K
  • user avatar
    Haize Labs
    @haizelabs
    Dec 12, 2024
    1/ 🚨 Exciting news: @haizelabs has teamed up with @AI21Labs to align the Jamba language model with businesses' ethical & operational needs. A big step toward safe, reliable AI in the enterprise! Here’s how we did it: 🧵
    10K
  • user avatar
    Haize Labs
    @haizelabs
    Sep 12, 2024
    Replying to @snowclipsed
    you looking to leave your internship ?
    16K
  • user avatar
    Haize Labs
    @haizelabs
    Feb 19, 2025
    We are incredibly excited to release Verdict, a library for scaling judge-time compute. Verdict powers judges that match or beat SOTA reasoning models like o1 and o3-mini, for a fraction of the cost and latency. Get started today: => verdict.haizelabs.com =>
    user avatar
    Leonard Tang
    @leonardtang_
    Feb 19, 2025
    First came pre-training scaling; then came inference-time scaling. Now comes judge-time scaling. Despite progress in AI through scaled inference-time compute, AI remains unreliable in open-ended, non-verifiable domains. The key limitation is not generation—it is evaluation.
    Image
    00:00
    8K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement