This repo contains source code for our ECCV 2020 work MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection. Our model is implemented under Pytorch.
- Pytorch 1.4 +
- numpy
- tqdm
- moviepy
- cv2
Change directory to src and run following code:
CUDA_VISIBLE_DEVICES=1 python trainMIL.py --dataset youtube --domain gymnastics \
--train_path /home/share/Highlight/proDataset/TrainingSet/ \
--test_path /home/share/Highlight/proDataset/DomainSpecific \
--topk_mAP 1 --FNet MILModel10 --AM AttentionModule \
--DS MILDataset --AHLoss AdaptiveHingerLoss \
--short_lower 10 --short_upper 40 --long_lower 60 --long_upper 60000 --bagsize 60 Parameters:
CUDA_VISIBLE_DEVICES: specify GPU Id for trainingdataset: choose dataset, alternatives: youtube, tvsum, cosumdomain: choose target domain in given dataset, e.g., gymnastics for youtube datasettrain_path: extracted feature file for training, mentioned abovetest_path: extracted feature file for testingtopk_mAP: specify test metric, 1 or 5 in our paperFNet: which model to use to predict highlight score for each segment in video, in our paper:MILModel10AM: which model to fuse visual feature and audio feature, in our paper:AttentionModule_1DS: dataset model: in out paper:MILDatasetAHLoss: hinger loss used in our paper
See visual-audio fusion/opts.py for details of data selection hyper-parameters.
The extracted features of three test datasets are available at here.
- Youtube:
| Topic | mAP |
|---|---|
| dog | 0.5816 |
| gymnastics | 0.6165 |
| parkour | 0.7020 |
| skating | 0.7217 |
| skiing | 0.5866 |
| surfing | 0.6514 |
- TVsum:
| Topic | top-5 mAP |
|---|---|
| VT | 0.8062 |
| VU | 0.6832 |
| GA | 0.7821 |
| MS | 0.8183 |
| PK | 0.7807 |
| PR | 0.6584 |
| FM | 0.5780 |
| BK | 0.7502 |
| BT | 0.8019 |
| DS | 0.6551 |
- CoSum:
| Topic | top-5 mAP |
|---|---|
| BJ | 0.8450 |
| BP | 0.9887 |
| ET | 0.9156 |
| ERC | 1 |
| KP | 0.9611 |
| MLB | 0.9353 |
| NFL | 1 |
| NDC | 0.9536 |
| SL | 0.8896 |
| SF | 0.7897 |
If you find our work helpful in your research, please cite our paper via:
Bib:
@inproceedings{hong2020mini,
title={MINI-Net: Multiple Instance Ranking Network for Video Highlight Detec- tion},
author={Hong, Fa-Ting and Huang, Xuanteng and Li, Wei-Hong and Zheng, Wei-Shi},
booktitle={European Conference on Computer Vision},
year={2020}
}
More information about our work can be viewed in https://harlanhong.github.io.
