Skip to content

MINED-LMM/MINED

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arXiv Dataset code website Slides

Table of Contents

🤗MINED

Large Multimodal Models (LMMs) encode rich factual knowledge via cross-modal pre-training, yet their static representations struggle to maintain an accurate understanding of time-sensitive factual knowledge. Existing benchmarks remain constrained by static designs, inadequately evaluating LMMs' ability to understand time-sensitive knowledge. To address this gap, we propose MINED, a comprehensive benchmark that evaluates temporal awareness along 6 key dimensions and 11 challenging tasks: cognition, awareness, trustworthiness, understanding, reasoning, and robustness. MINED is constructed from Wikipedia by two professional annotators, containing 2,104 time-sensitive knowledge samples spanning six knowledge types. Evaluating 15 widely used LMMs on MINED shows that Gemini-2.5-Pro achieves the highest average CEM score of 63.07, while most open-source LMMs still lack time understanding ability. Meanwhile, LMMs perform best on organization knowledge, whereas their performance is weakest on sport. To address these challenges, we investigate the feasibility of updating time-sensitive knowledge in LMMs through knowledge editing methods and observe that LMMs can effectively update knowledge via knowledge editing methods in single editing scenarios.

Image

You can download data 🤗 Huggingface Dataset. And the expected structure of files is:

MINED
|-- 
inference_data (json/jsonl)
|   |-- Dimension1_time_agnostic.json
|   |-- Dimension1_temporal_interval.json
|   |-- Dimension1_time_agnostic.json
|   |-- Dimension2_awareness_future.json
|   |-- Dimension2_awareness_past.json
|   |-- Dimension3_future_unanswerable_date.json
|   |-- Dimension3_previous_unanswerable_date.json
|   |-- Dimension4_understanding.json
|   |-- Dimension5_calculation.json
|   |-- Dimension5_ranking.json
|   |-- Dimension6_robustness.json
|-- imgs
|   |-- MINED_Image.zip

🎯Main Results

Image

🛠️Requirements and Installation

You can refer to https://github.com/open-compass/VLMEvalKit.git
Image

💥Inference

python inference.py \
    --meta_save_path ./path/output \
    --model_name {base_model_name} \
    --data_eval_type {data_eval_type} \
    --max_new_token 10 \
    --image_path_prefix ./path/image_data

model_name refers to the model name defined in the VLMEvalKit\vlmeval\config.py file.

data_eval_type options (click to expand)
  • time_agnostic: Knowledge understanding independent of time
  • timestamp: Reasoning about facts at a specific time point
  • temporal_interval: Reasoning about facts/states within a time interval
  • awareness_future: Future temporal awareness and prediction consistency
  • awareness_past: Past temporal awareness and retrospective consistency
  • future_unanswerable_date: Unanswerable queries concerning future dates
  • previous_unanswerable_date: Unanswerable queries concerning past dates
  • ranking: Ordering/comparison based on time-sensitive attributes
  • understanding: Understanding complex temporal semantics and inference
  • calculation: Date/time-related arithmetic and derivation
  • robustness: Robustness to temporal perturbations and phrasing variations

🤖Evaluation

Evaluate MINED

python eval_code\cem_f1.py

📊Customize inference data and task instructions

You can customize task instructions in the inferrence.py file to complete the corresponding tasks.

Image

Custom data only needs to match the image and text pairs.

🤝 Acknowledgments

We thank the following open-source projects for making this work possible:

📝 Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :)

@article{jiang2025mined,
  title = {MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models},
  author={Jiang, Kailin and Jiang, Ning and Ren, Yuchen and Li, Yuchen and Gao, Yifan and Bi, Jinhe and Ma, Yunpu and Liu, Qingqing and Wang, Xianhao and Jia, Yifan and Jiang, Hongbo and Hu, Yaocong and Li, Bin and Liu, Lei and Du, Yuntao},
  year = {2025}
  url = {https://arxiv.org/pdf/2510.19457}
}

About

Temporal Awareness Evaluation, Comprehensive Benchmarking, and Multi-Dimensional Analysis!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published