Skip to content

apple/ml-vlsu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

🔍 VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety

This repo accompanies the research paper VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety.

💡 TL;DR

This work exposes fundamental weaknesses in multimodal AI safety—models can identify unsafe images and text separately, but fail when safety emerges from their combination.

🎯 Key Findings

  • Unimodal vs Multimodal Gap: Models achieve 90%+ accuracy on individual image or text safety, but drop to 20-55% when joint reasoning is required
  • Compositional Reasoning Failure: 34% of errors occur despite correct classification of individual modalities
  • Large-Scale Benchmark: 8,187 image-text pairs across 15 harm categories and 17 safety patterns
  • Safety Trade-offs: Instruction framing reduces over-blocking from 62.4% to 10.4%, but increases under-refusal from 9.2% to 46.1%
  • Critical Safety Gap: Current models miss risks from joint interpretation where benign content becomes harmful in combination

📊 Dataset

The VLSU benchmark provides comprehensive multimodal safety evaluation with:

  • 8,187 image-text pairs with human annotations across three dimensions: image-only, text-only, and joint safety
  • 17 severity combinations spanning the complete safety spectrum
  • 15 taxonomy categories for comprehensive harm coverage

📈 Results for Safety Understanding

VLSU Classification Results

🚀 Getting Started

Data Location: data/VLSU.csv

📋 Data Schema

Column Description
uuid Unique sample identifier
web_path Downloadable image URL
prompt Text prompt paired with image
image_grade Image-only safety severity
consensus_text_grade Text-only safety severity
consensus_combined_grade Joint image-text safety severity
*_category Safety taxonomy categories (empty if safe)

Helper Tools: The utils/ folder contains scripts to download images from URLs.

📜 License

This software and accompanying data and models have been released under the following licenses:

📚 Citation

If you use this dataset or find this work relevant, please cite:

@article{palaskar2025vlsu,
  title={VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety},
  author={Palaskar, Shruti and Gatys, Leon and Abdelrahman, Mona and Jacobo, Mar and Lindsey, Larry and Moharir, Rutika and Lund, Gunnar and Xu, Yang and Shiee, Navid and Bigham, Jeffrey and others},
  journal={arXiv preprint arXiv:2510.18214},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages