This repo accompanies the research paper VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety.
This work exposes fundamental weaknesses in multimodal AI safety—models can identify unsafe images and text separately, but fail when safety emerges from their combination.
- Unimodal vs Multimodal Gap: Models achieve 90%+ accuracy on individual image or text safety, but drop to 20-55% when joint reasoning is required
- Compositional Reasoning Failure: 34% of errors occur despite correct classification of individual modalities
- Large-Scale Benchmark: 8,187 image-text pairs across 15 harm categories and 17 safety patterns
- Safety Trade-offs: Instruction framing reduces over-blocking from 62.4% to 10.4%, but increases under-refusal from 9.2% to 46.1%
- Critical Safety Gap: Current models miss risks from joint interpretation where benign content becomes harmful in combination
The VLSU benchmark provides comprehensive multimodal safety evaluation with:
- 8,187 image-text pairs with human annotations across three dimensions: image-only, text-only, and joint safety
- 17 severity combinations spanning the complete safety spectrum
- 15 taxonomy categories for comprehensive harm coverage
Data Location: data/VLSU.csv
| Column | Description |
|---|---|
uuid |
Unique sample identifier |
web_path |
Downloadable image URL |
prompt |
Text prompt paired with image |
image_grade |
Image-only safety severity |
consensus_text_grade |
Text-only safety severity |
consensus_combined_grade |
Joint image-text safety severity |
*_category |
Safety taxonomy categories (empty if safe) |
Helper Tools: The utils/ folder contains scripts to download images from URLs.
This software and accompanying data and models have been released under the following licenses:
- Code: Apple Sample Code License (ASCL)
- Data: CC-BY-NC-ND
If you use this dataset or find this work relevant, please cite:
@article{palaskar2025vlsu,
title={VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety},
author={Palaskar, Shruti and Gatys, Leon and Abdelrahman, Mona and Jacobo, Mar and Lindsey, Larry and Moharir, Rutika and Lund, Gunnar and Xu, Yang and Shiee, Navid and Bigham, Jeffrey and others},
journal={arXiv preprint arXiv:2510.18214},
year={2025}
}
