Skip to content

EVOKE-LMM/EVOKE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arXiv Dataset Models code website

Table of Contents

🤗EVOKE

Image

To evaluate evolving knowledge injection in LMMs, we propose a pipeline to automatically collect evolving knowledge, constructing the EVOlving KnowledgE (EVOKE) benchmark.

Image

You can download data 🤗 Huggingface Dataset. And the expected structure of files is:

EVOKE
|-- json/jsonl
|   |-- evoke_injection_data.json
|   |-- evoke_evaluation_data.jsonl
|-- imgs
|   |-- injection
|   |   |-- evoke_entity_injection_imgs.zip
|   |   |-- evoke_news_injection_imgs.zip
|   |-- evaluation
|   |   |-- evoke_news_evaluation_imgs.zip
|   |   |-- evoke_entity_evaluation_imgs.zip
Image

🛠️Requirements and Installation

Please refer to the code repository

https://github.com/haotian-liu/LLaVA

https://github.com/QwenLM/Qwen-VL

https://github.com/TIGER-AI-Lab/UniIR

🌟Retrieval

For image_only:

python retrieval/retrieval_image_only.py

For text_only:
python retrieval/retrieval_text_only.py

For UniIR:
step1
python retrieval/UniIR/src/common/mbeir_retriever.py

get retrieval/UniIR/retrieval_results/CLIP_SF/Large/Instruct/UniRAG/run_files/mbeir_new_self_union_pool_test_k10_run.txt

step2
python retrieval/retrieval_UniIR.py

💥Training

Please refer to the code repository

https://github.com/haotian-liu/LLaVA

https://github.com/QwenLM/Qwen-VL

🤖Inference and Evaluation

Inference + LLaVA
python LLaVA/mm_rag_llava_inference.py --test_type text_only --top_k 1
python LLaVA/mm_rag_llava_inference.py --test_type image_only --top_k 1
python LLaVA/mm_rag_llava_inference.py --test_type UniIR --top_k 1
python LLaVA/mm_rag_llava_inference.py --test_type ground_truth --top_k 1


Inference + Qwen-VL-Chat
python Qwen-VL/eval_mm/evaluate_vqa.py --test_type text_only --few-shot 1
python Qwen-VL/eval_mm/evaluate_vqa.py --test_type image_only --few-shot 1
python Qwen-VL/eval_mm/evaluate_vqa.py --test_type UniIR --few-shot 1
python Qwen-VL/eval_mm/evaluate_vqa.py --test_type ground_truth --few-shot 1


Evaluation
step1
python evaluation/eval_acc_f1.py

step2
python evaluation/all_type_score.py

About

【ICLR 2026 🔥】This work introduces MMEVOKE benchmark to reveal challenges in knowledge injection and explores potential solutions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages