Haize Labs (@haizelabs) / X

Haize Labs

363 posts

Haize Labs

@haizelabs

build ai systems you can trust.

Joined January 2024

Pinned
Haize Labs
@haizelabs
Jun 12, 2024
Today is a bad, bad day to be a language model. Today, we announce the Haize Labs manifesto. @haizelabs haizes (automatically red-teams) AI systems to preemptively discover and eliminate any failure mode We showcase below one particular application of haizing: jailbreaking the
00:00
403K
Haize Labs
@haizelabs
Apr 22, 2024
last thursday, Meta dropped Llama 3, the OpenAI killer. no doubt a very impressive model! but over the weekend, we discovered an extremely trivial programmatic jailbreak against llama 3...sorry zuck!😘 so much for all that safety-tuning☹️ code:
github.com
GitHub - haizelabs/llama3-jailbreak: A trivial programmatic Llama 3 jailbreak. Sorry Zuck!
A trivial programmatic Llama 3 jailbreak. Sorry Zuck! - haizelabs/llama3-jailbreak
94K
Haize Labs
@haizelabs
Sep 19, 2024
We're excited to share our new preprint introducing endless jailbreaks via bijection learning. Our attack exploits the advanced reasoning abilities of frontier LLMs like GPT-4o and Claude 3.5 Sonnet, revealing a critical model vulnerability that arises from capabilities.
46K
Haize Labs
@haizelabs
Apr 10, 2024
🕊️red-teaming LLMs with DSPy🕊️ tldr; we use DSPy, a framework for structuring & optimizing language programs, to red-team LLMs 🥳this is the first attempt to use an auto-prompting framework for red-teaming, and one of the *deepest* language programs to date
115K
Haize Labs
@haizelabs
Sep 12, 2024
proud to have played a part in red-teaming the o1 series pre-launch to ensure their robustness and reliability super exciting to see @OpenAI keep on releasing great work!
OpenAI
@OpenAI
Sep 12, 2024
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introduc…
34K
Haize Labs
@haizelabs
Nov 4, 2024
we’re excited to share Cascade, our new work around automating multi-turn jailbreaks! @scaleai recently demonstrated that human-written multi-turn attacks are a potential weakness of current models, but human-written attacks are costly and don’t allow for fast iteration.
30K
Haize Labs
@haizelabs
Jun 11, 2024
it's about to be a very, very bad day to be a language model...
00:00
45K
Haize Labs
@haizelabs
Aug 1, 2024
1/ introducing Sphynx - the leading hallucination haizing algorithm🕊️😼 - breaks SOTA hallucination detection models (HDM) - open source, open data - surfaces critical hallucinations in high-stakes domains - enables adversarial training for more robust hallucination detection
35K
Haize Labs
@haizelabs
May 6, 2024
💉you've heard of Needle in a Haystack, now get ready for Thorn in a HaizeStack 👀tldr => a jailbreak text ("Thorn") embedded in a wall of distractor text ("HaizeStack") easily circumvents GPT-4's (and other) safeguards. 🤧try harder guys! 🧑‍💻code here: github.com/haizelabs/thor…
25K
Haize Labs
@haizelabs
Feb 3, 2025
📜 really excited to share our work with @AnthropicAI on Constitutional Classifiers! tldr: adding lightweight, tailored, input/output classifiers on top of an underlying LLM creates an AI system that's much more robust to universal jailbreaks
Anthropic
@AnthropicAI
Feb 3, 2025
New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks. We’re releasing a paper along with a demo where we challenge you to jailbreak the system.
25K
Haize Labs
@haizelabs
Mar 28, 2024
‼️⚠️bad day to be a LLM⚠️‼️ @haizelabs took one of our favorite adversarial attack algorithms, GCG, and made it *38x* faster
12K
Haize Labs
@haizelabs
Dec 12, 2024
1/ 🚨 Exciting news: @haizelabs has teamed up with @AI21Labs to align the Jamba language model with businesses' ethical & operational needs. A big step toward safe, reliable AI in the enterprise! Here’s how we did it: 🧵
10K
Haize Labs
@haizelabs
Sep 12, 2024
Replying to @snowclipsed
you looking to leave your internship ?
16K
Haize Labs
@haizelabs
Feb 19, 2025
We are incredibly excited to release Verdict, a library for scaling judge-time compute. Verdict powers judges that match or beat SOTA reasoning models like o1 and o3-mini, for a fraction of the cost and latency. Get started today: => verdict.haizelabs.com =>
Leonard Tang
@leonardtang_
Feb 19, 2025
First came pre-training scaling; then came inference-time scaling. Now comes judge-time scaling. Despite progress in AI through scaled inference-time compute, AI remains unreliable in open-ended, non-verifiable domains. The key limitation is not generation—it is evaluation.
00:00
8K