Contramont Research
We study where AI safety and security methods break. Our current focus is cryptographic model organisms: models with hidden behaviors and cryptographic hardness guarantees that reveal fundamental limitations of existing safety techniques.
Publications
Unelicitable Backdoors via Cryptographic Transformer Circuits (NeurIPS 2024)
- Project page
Our collaborations include the REAL benchmark (NeurIPS 2025), Humanity's Last Exam (Nature 2025), and SynLlama (ACS Central Science 2025).
Forthcoming work: cryptographic sandbagging, evading runtime monitoring, compiled model obfuscation.
Contramont Research is a 501(c)(3) nonprofit.