Multimodal Fact-Level Attribution for Verifiable Reasoning

Authors: David Wan, Han Wang, Ziyang Wang, Elias Stengel-Eskin, Hyunji Lee, Mohit Bansal

This repository contains the code and data for the paper "Multimodal Fact-Level Attribution for Verifiable Reasoning".

Installation

Prerequisites

Python: 3.12+
Package Manager: uv (recommended)
System Dependencies: CUDA, FFmpeg (for video processing)

1. General Setup

For most models (excluding Qwen-VL and Qwen-Omni), you can simply install the dependencies from the requirements file:

uv pip install -r requirements.txt

2. Model-Specific Setup (Qwen)

Due to conflicting dependencies, we recommend maintaining separate virtual environments for Qwen-VL and Qwen-Omni.

For Qwen-VL: Install vllm via the official package:

uv pip install vllm --torch-backend=auto

For Qwen-Omni: Install the specific version of vllm and the vllm-omni fork:

uv pip install vllm==0.15.0 --torch-backend=auto
git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
uv pip install -e .

3. Configuration

API Keys If you are using cloud-based models (e.g., Gemini), set the necessary environment variables:

export GEMINI_API_KEY="your-gemini-api-key"
# Add other keys as needed (e.g., OPENAI_API_KEY)

Dataset Paths Update src/util.py to point to your local dataset directories:

DATASET_CONFIGS = {
    "videommmu": {
        "video_dir": "/path/to/VideoMMMU/videos/",
        "hf_path": "/path/to/data/VideoMMMU_sample",
    },
    # Add other datasets here
}

Project Structure

.
├── README.md                          # This file
├── src/                               # Core source code
│   ├── models.py                      # Multimodal model implementations
│   ├── util.py                        # Shared utilities (text processing, data loading)
│   ├── run_baseline.py                # Generate baseline responses
│   ├── run_baseline_with_citation.py  # Generate responses with citations
│   ├── run_metric.py                  # Evaluation pipeline
│   ├── run_generation_program.py      # Generation with iterative feedback
│   └── generation_program_util.py     # Utilities for the generation program
│
├── prompts/                           # Prompt templates
│   ├── base.txt                       # Base reasoning prompt
│   ├── base_with_citations.txt        # Reasoning with citation extraction
│   ├── decontextualization.txt        # Remove document context
│   ├── atomic_decomposition.txt       # Break into atomic facts
│   ├── coverage_prompt.txt            # Fact verification
│   ├── entailment_prompt.txt          # Entailment checking
│   └── ...                            # Additional task-specific prompts
│
└── requirements.txt                   # Python dependencies

Data

Datasets and model generations are available via Google Drive: [Link to Drive]

Usage

1. Generate Baseline Responses

Generate model responses without citations.

python src/run_baseline.py <dataset_name> <model_name>

Example: python src/run_baseline.py videommmu gemini-2.5-flash
Arguments:
dataset_name: Must be defined in DATASET_CONFIGS.
model_name: Model identifier from the supported models list.

2. Generate Responses with Citations

Generate responses using the base_with_citations.txt prompt to encourage explicit citations.

python src/run_baseline_with_citation.py <dataset_name> <model_name>

3. Run Evaluation Pipeline

This pipeline decontextualizes responses, decomposes them into atomic facts, extracts citations, and computes verification scores.

python src/run_metric.py --input-file <path-to-generations.json> --output-dir <output-directory>

4. Program-Aided Generation

Generate responses with iterative refinement and feedback.

python src/run_generation_program.py <dataset_name> <model_name>

Advanced Configuration

You can tune performance using the following environment variables:

export OPENAI_API_BASE="your-openai-endpoint"          # For custom deployments
export VLLM_USE_V1=0                                   # Toggle VLLM versions
export DECORD_EOF_RETRY_MAX=20480                      # Video processing stability
export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" # Memory management

Citation

If you use this framework in your research, please cite:

@misc{wan2026multimodalfactlevelattributionverifiable,
      title={Multimodal Fact-Level Attribution for Verifiable Reasoning}, 
      author={David Wan and Han Wang and Ziyang Wang and Elias Stengel-Eskin and Hyunji Lee and Mohit Bansal},
      year={2026},
      eprint={2602.11509},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.11509}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Fact-Level Attribution for Verifiable Reasoning

Installation

Prerequisites

1. General Setup

2. Model-Specific Setup (Qwen)

3. Configuration

Project Structure

Data

Usage

1. Generate Baseline Responses

2. Generate Responses with Citations

3. Run Evaluation Pipeline

4. Program-Aided Generation

Advanced Configuration

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
prompts		prompts
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Multimodal Fact-Level Attribution for Verifiable Reasoning

Installation

Prerequisites

1. General Setup

2. Model-Specific Setup (Qwen)

3. Configuration

Project Structure

Data

Usage

1. Generate Baseline Responses

2. Generate Responses with Citations

3. Run Evaluation Pipeline

4. Program-Aided Generation

Advanced Configuration

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages