- Table of Contents
- 🤗MINED
- 🎯Main Results
- 🛠️Requirements and Installation
- 💥Inference
- 🤖Evaluation
- 📊Customize inference data and task instructions
- 🤝 Acknowledgments
- 📝 Citation
Large Multimodal Models (LMMs) encode rich factual knowledge via cross-modal pre-training, yet their static representations struggle to maintain an accurate understanding of time-sensitive factual knowledge. Existing benchmarks remain constrained by static designs, inadequately evaluating LMMs' ability to understand time-sensitive knowledge. To address this gap, we propose MINED, a comprehensive benchmark that evaluates temporal awareness along 6 key dimensions and 11 challenging tasks: cognition, awareness, trustworthiness, understanding, reasoning, and robustness. MINED is constructed from Wikipedia by two professional annotators, containing 2,104 time-sensitive knowledge samples spanning six knowledge types. Evaluating 15 widely used LMMs on MINED shows that Gemini-2.5-Pro achieves the highest average CEM score of 63.07, while most open-source LMMs still lack time understanding ability. Meanwhile, LMMs perform best on organization knowledge, whereas their performance is weakest on sport. To address these challenges, we investigate the feasibility of updating time-sensitive knowledge in LMMs through knowledge editing methods and observe that LMMs can effectively update knowledge via knowledge editing methods in single editing scenarios.
You can download data 🤗 Huggingface Dataset. And the expected structure of files is:
MINED
|--
inference_data (json/jsonl)
| |-- Dimension1_time_agnostic.json
| |-- Dimension1_temporal_interval.json
| |-- Dimension1_time_agnostic.json
| |-- Dimension2_awareness_future.json
| |-- Dimension2_awareness_past.json
| |-- Dimension3_future_unanswerable_date.json
| |-- Dimension3_previous_unanswerable_date.json
| |-- Dimension4_understanding.json
| |-- Dimension5_calculation.json
| |-- Dimension5_ranking.json
| |-- Dimension6_robustness.json
|-- imgs
| |-- MINED_Image.zip
You can refer to https://github.com/open-compass/VLMEvalKit.git
python inference.py \
--meta_save_path ./path/output \
--model_name {base_model_name} \
--data_eval_type {data_eval_type} \
--max_new_token 10 \
--image_path_prefix ./path/image_datamodel_name refers to the model name defined in the VLMEvalKit\vlmeval\config.py file.
data_eval_type options (click to expand)
time_agnostic: Knowledge understanding independent of timetimestamp: Reasoning about facts at a specific time pointtemporal_interval: Reasoning about facts/states within a time intervalawareness_future: Future temporal awareness and prediction consistencyawareness_past: Past temporal awareness and retrospective consistencyfuture_unanswerable_date: Unanswerable queries concerning future datesprevious_unanswerable_date: Unanswerable queries concerning past datesranking: Ordering/comparison based on time-sensitive attributesunderstanding: Understanding complex temporal semantics and inferencecalculation: Date/time-related arithmetic and derivationrobustness: Robustness to temporal perturbations and phrasing variations
Evaluate MINED
python eval_code\cem_f1.pyYou can customize task instructions in the inferrence.py file to complete the corresponding tasks.
Custom data only needs to match the image and text pairs.
We thank the following open-source projects for making this work possible:
- VLMEvalKit for the evaluation.
If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :)
@article{jiang2025mined,
title = {MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models},
author={Jiang, Kailin and Jiang, Ning and Ren, Yuchen and Li, Yuchen and Gao, Yifan and Bi, Jinhe and Ma, Yunpu and Liu, Qingqing and Wang, Xianhao and Jia, Yifan and Jiang, Hongbo and Hu, Yaocong and Li, Bin and Liu, Lei and Du, Yuntao},
year = {2025}
url = {https://arxiv.org/pdf/2510.19457}
}


