This is the code for reproducing Step Localization experiments with BioVL-QR dataset.
EgoVLPv2 [1] was used for text and video feature extraction, and StepFormer [2] was used for aligning text and video.
The embedding data encoded using EgoVLPv2 for the videos, texts, and labels of the BioVL-QR dataset is stored under the data/BioQR directory.
The metadata is also stored under the data/BioQR directory.
Use Python version 3.9.10 and set up the environment using venv.
git clone [email protected]:nishi10mo/BioVLQR_StepLocalization.git
python -m venv env
source env/bin/activate
pip install -r requirements.txt
The experiment can be reproduced using the following commands.
./reproduct_experiment.sh
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Original work by SamsungLabs. (https://github.com/SamsungLabs/StepFormer).
[1] Kristen Grauman et al. Ego4D: Around the World in 3,000 Hours of Egocentric Video. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18973–18990, 2022.
[2] Nikita Dvornik et al. StepFormer: Self-Supervised Step Discovery and Localization in Instructional Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18952–18961, 2023.