Contramont Research

We study where AI safety and security methods break. Our current focus is cryptographic model organisms: models with hidden behaviors and cryptographic hardness guarantees that reveal fundamental limitations of existing safety techniques.

Publications

Unelicitable Backdoors via Cryptographic Transformer Circuits (NeurIPS 2024)
- Project page

Our collaborations include the REAL benchmark (NeurIPS 2025), Humanity's Last Exam (Nature 2025), and SynLlama (ACS Central Science 2025).

Forthcoming work: cryptographic sandbagging, evading runtime monitoring, compiled model obfuscation.

Contramont Research is a 501(c)(3) nonprofit.