Name	Name	Last commit message	Last commit date
parent directory ..
data/MSR-VTT/anns	data/MSR-VTT/anns
tvr	tvr
README.md	README.md
main_retrieval.py	main_retrieval.py
requirements.txt	requirements.txt

EMCL-Net

This is the PyTorch code of the EMCL-Net.

Setup code environment

conda create -n EMCL python=3.9
conda activate EMCL
pip install -r requirements.txt
pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 -f https://download.pytorch.org/whl/torch_stable.html

Download CLIP Model

cd tvr/models
wget https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt
# wget https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt
# wget https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt

Download Datasets

MSRVTT

For MSRVTT, the official data and video links can be found in link.

For the convenience, the splits and captions can be found in sharing from CLIP4Clip,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip

Besides, the raw videos can be found in sharing from Frozen in Time, i.e.,

wget https://www.robots.ox.ac.uk/~maxbain/frozen-in-time/data/MSRVTT.zip

MSVD

For MSDC, the official data and video links can be found in link.

For convenience, we share the processed dataset in link.

https://disk.pku.edu.cn:443/link/CC02BD15907BFFF63E5AAE4BF353A202

LSMDC

For LSMDC, the official data and video links can be found in link.

Due to license restrictions, we cannot share this dataset.

ActivityNet Captions

For ActivityNet Captions, the official data and video links can be found in link.

For convenience, we share the processed dataset in link.

https://disk.pku.edu.cn:443/link/83351ABDAEA4A17A5A139B799BB524AC

DiDeMo

For DiDeMo, the official data and video links can be found in link.

For convenience, we share the processed dataset in link.

https://disk.pku.edu.cn:443/link/BBF9F5990FC4D7FD5EA9777C32901E62

Compress Video

python preprocess/compress_video.py --input_root [raw_video_path] --output_root [compressed_video_path]

This script will compress the video to 3fps with width 224 (or height 224). Modify the variables for your customization.

Text-Video Retrieval

Train on MSR-VTT 1k

Protocol	T2V R@1	T2V R@5	T2V R@10	Mean R
EMCL-Net (2 V100 GPUs)	47.0	72.6	83.0	13.6
EMCL-Net (8 V100 GPUs)	48.2	74.7	83.6	13.1

We recommend using more GPUs for better performance:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch \
--master_port 2502 \
--nproc_per_node=8 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--n_display 50 \
--epochs 5 \
--lr 1e-4 \
--coef_lr 1e-3 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path data/MSR-VTT/anns \
--video_path ${DATA_PATH}/MSRVTT_Videos \
--datatype msrvtt \
--max_words 32 \
--max_frames 12 \
--video_framerate 1 \
--output_dir ${OUTPUT_PATH}

You can also use 2 V100 GPUs to reproduce the results in the paper:

CUDA_VISIBLE_DEVICES=0,1 \
python -m torch.distributed.launch \
--master_port 2502 \
--nproc_per_node=2 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--n_display 50 \
--epochs 5 \
--lr 1e-4 \
--coef_lr 1e-3 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path data/MSR-VTT/anns \
--video_path ${DATA_PATH}/MSRVTT_Videos \
--datatype msrvtt \
--max_words 32 \
--max_frames 12 \
--video_framerate 1 \
--output_dir ${OUTPUT_PATH}

Train on LSMDC

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch \
--master_port 2502 \
--nproc_per_node=4 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--n_display 10 \
--epochs 5 \
--lr 1e-4 \
--coef_lr 1e-3 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path ${Anno_PATH} \
--video_path ${DATA_PATH} \
--datatype lsmdc \
--max_words 32 \
--max_frames 12 \
--video_framerate 1 \
--output_dir ${OUTPUT_PATH}

Train on MSVD

Protocol	T2V R@1	T2V R@5	T2V R@10	Median R	Mean R	V2T R@1	V2T R@5	V2T R@10	Median R	Mean R
EMCL-Net	42.1	71.3	81.1	2.0	17.6	54.3	81.3	88.1	1.0	5.6

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch \
--master_port 2502 \
--nproc_per_node=4 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--n_display 10 \
--epochs 20 \
--lr 1e-4 \
--coef_lr 1e-3 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path ${Anno_PATH} \
--video_path ${DATA_PATH} \
--datatype msvd \
--max_words 32 \
--max_frames 12 \
--video_framerate 1 \
--output_dir ${OUTPUT_PATH}

Train on ActivityNet Captions

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch \
--master_port 2502 \
--nproc_per_node=8 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--n_display 10 \
--epochs 5 \
--lr 1e-4 \
--coef_lr 1e-3 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path ${Anno_PATH} \
--video_path ${DATA_PATH} \
--datatype activity \
--max_words 64 \
--max_frames 64 \
--video_framerate 1 \
--output_dir ${OUTPUT_PATH}

Train on DiDeMo

Protocol	T2V R@1	T2V R@5	T2V R@10	Median R	Mean R	V2T R@1	V2T R@5	V2T R@10	Median R	Mean R
EMCL-Net	46.8	74.3	83.1	2.0	12.3	45.0	73.2	82.7	2.0	9.0

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch \
--master_port 2502 \
--nproc_per_node=8 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--n_display 10 \
--epochs 5 \
--lr 1e-4 \
--coef_lr 1e-3 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path ${Anno_PATH} \
--video_path ${DATA_PATH} \
--datatype didemo \
--max_words 64 \
--max_frames 64 \
--video_framerate 1 \
--output_dir ${OUTPUT_PATH}

Acknowledge

This code implementation are adopted from CLIP and DRL. We sincerely appreciate for their contributions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

EMCL-Net

Setup code environment

Download CLIP Model

Download Datasets

MSRVTT

MSVD

LSMDC

ActivityNet Captions

DiDeMo

Compress Video

Text-Video Retrieval

Train on MSR-VTT 1k

Train on LSMDC

Train on MSVD

Train on ActivityNet Captions

Train on DiDeMo

Acknowledge

FilesExpand file tree

EMCL-Net

Directory actions

More options

Directory actions

More options

Latest commit

History

EMCL-Net

Folders and files

parent directory

README.md

EMCL-Net

Setup code environment

Download CLIP Model

Download Datasets

MSRVTT

MSVD

LSMDC

ActivityNet Captions

DiDeMo

Compress Video

Text-Video Retrieval

Train on MSR-VTT 1k

Train on LSMDC

Train on MSVD

Train on ActivityNet Captions

Train on DiDeMo

Acknowledge