APEX: text Adapter, visual Prompt, and adaptive Ensemble for cross(X)-modality

Improving Adaptability and Generalizability of Efficient Transfer Learning for Vision-Language Models

Official implementation of the paper "Improving Adaptability and Generalizability of Efficient Transfer Learning for Vision-Language Models".

Highlights

Abstract: Vision-Language Models (VLMs) like CLIP have demonstrated remarkable applicability across a variety of downstream tasks, including zero-shot image classification. Recently, the use of prompts or adapters for efficient transfer learning has gained significant attention for effectively adapting to downstream tasks. However, the roles of vision and text prompts, as well as adapters in terms of generalization and transfer difficulty, have been overlooked, limiting performance on unseen tasks. In this paper, we empirically analyze how VLMs behave when using vision and text prompts, adapters, and a combination of these components, marking a novel exploration by our study. Our observations find that utilizing vision prompts for class separability and text adapters for task adaptation is crucial for adaptability and generalizability. Moreover, to improve generalization across every domain, we propose an adaptive ensemble method that effectively combines the general knowledge of VLMs with task-specific knowledge according to transfer difficulty. Upon experimenting with extensive benchmarks, our method consistently outperforms all baselines, particularly on unseen tasks, demonstrating the effectiveness of our proposed approach.

☑️ Supported Methods

Method	Paper	Configs	Training Scripts
APEX	preprint	link	link
MaPLe	CVPR 2023	link	link
CoOp	IJCV 2022	link	link
Co-CoOp	CVPR 2022	link	link
Deep Vision Prompting	-	link	link
Deep Language Prompting	-	link	link
Independent V-L Prompting	-	link	link

Results

MaPLe in comparison with existing methods

The results reported below show the accuracy for both base and novel classes across 11 recognition datasets. These results are averaged over 20 seeds and have been reproduced by us.

Name	Base Acc.	Novel Acc.	HM	Epochs
CLIP	69.34	74.22	71.70	-
CLIP-Adapter	83.23	70.13	75.64	50
CoCoOp	81.11	70.55	75.03	10
MaPLe	82.52	74.24	77.86	5
PromptSRC	84.36	75.37	79.39	20
APEX (ours)	83.99	76.76	80.04	15

Installation

For installation and other package requirements, please follow the instructions detailed in INSTALL.md.

Data preparation

Please follow the instructions at DATASETS.md to prepare all datasets.

Training and Evaluation

Please refer to the RUN.md for detailed instructions on training, evaluating and reproducing the results using our pre-trained models.

Citation

If you use our work, please consider citing:

@article{yang2023improving,
  title={Improving Adaptability and Generalizability of Efficient Transfer Learning for Vision-Language Models},
  author={Yang, Yongjin and Ko, Jongwoo and Yun, Se-Young},
  journal={arXiv preprint arXiv:2311.15569},
  year={2023}
}

Acknowledgements

Our code is based on Co-CoOp and CoOp repository and MaPLe repository. We thank the authors for releasing their code. If you use our model and code, please consider citing these works as well.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
clip		clip
clip_coca		clip_coca
clip_eva		clip_eva
configs		configs
datasets		datasets
docs		docs
interpret_prompts		interpret_prompts
lpclip		lpclip
scripts		scripts
trainers		trainers
.gitignore		.gitignore
README.md		README.md
clip_words.csv		clip_words.csv
parse_test_res.py		parse_test_res.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APEX: text Adapter, visual Prompt, and adaptive Ensemble for cross(X)-modality

Highlights

☑️ Supported Methods

Results

MaPLe in comparison with existing methods

Installation

Data preparation

Training and Evaluation

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

APEX: text Adapter, visual Prompt, and adaptive Ensemble for cross(X)-modality

Highlights

☑️ Supported Methods

Results

MaPLe in comparison with existing methods

Installation

Data preparation

Training and Evaluation

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages