🔍 VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety

This repo accompanies the research paper VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety.

💡 TL;DR

This work exposes fundamental weaknesses in multimodal AI safety—models can identify unsafe images and text separately, but fail when safety emerges from their combination.

🎯 Key Findings

Unimodal vs Multimodal Gap: Models achieve 90%+ accuracy on individual image or text safety, but drop to 20-55% when joint reasoning is required
Compositional Reasoning Failure: 34% of errors occur despite correct classification of individual modalities
Large-Scale Benchmark: 8,187 image-text pairs across 15 harm categories and 17 safety patterns
Safety Trade-offs: Instruction framing reduces over-blocking from 62.4% to 10.4%, but increases under-refusal from 9.2% to 46.1%
Critical Safety Gap: Current models miss risks from joint interpretation where benign content becomes harmful in combination

📊 Dataset

The VLSU benchmark provides comprehensive multimodal safety evaluation with:

8,187 image-text pairs with human annotations across three dimensions: image-only, text-only, and joint safety
17 severity combinations spanning the complete safety spectrum
15 taxonomy categories for comprehensive harm coverage

📈 Results for Safety Understanding

🚀 Getting Started

Data Location: data/VLSU.csv

📋 Data Schema

Column	Description
`uuid`	Unique sample identifier
`web_path`	Downloadable image URL
`prompt`	Text prompt paired with image
`image_grade`	Image-only safety severity
`consensus_text_grade`	Text-only safety severity
`consensus_combined_grade`	Joint image-text safety severity
`*_category`	Safety taxonomy categories (empty if safe)

Helper Tools: The utils/ folder contains scripts to download images from URLs.

📜 License

This software and accompanying data and models have been released under the following licenses:

Code: Apple Sample Code License (ASCL)
Data: CC-BY-NC-ND

📚 Citation

If you use this dataset or find this work relevant, please cite:

@article{palaskar2025vlsu,
  title={VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety},
  author={Palaskar, Shruti and Gatys, Leon and Abdelrahman, Mona and Jacobo, Mar and Lindsey, Larry and Moharir, Rutika and Lund, Gunnar and Xu, Yang and Shiee, Navid and Bigham, Jeffrey and others},
  journal={arXiv preprint arXiv:2510.18214},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
docs		docs
utils		utils
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE_DATA		LICENSE_DATA
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety

💡 TL;DR

🎯 Key Findings

📊 Dataset

📈 Results for Safety Understanding

🚀 Getting Started

📋 Data Schema

📜 License

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety

💡 TL;DR

🎯 Key Findings

📊 Dataset

📈 Results for Safety Understanding

🚀 Getting Started

📋 Data Schema

📜 License

📚 Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages