Name	Name	Last commit message	Last commit date
parent directory ..
annotation_generation	annotation_generation
colab	colab
evaluation	evaluation
splits	splits
LICENSE	LICENSE
README.md	README.md
generate_all.sh	generate_all.sh

Tracking Any Point in 3D (TAPVid-3D)

Skanda Koppula, Ignacio Rocco, Yi Yang, Joe Heyward, João Carreira, Andrew Zisserman, Gabriel Brostow, Carl Doersch

Google DeepMind, University College London, University of Oxford

`TAPVid-3D Website` `TAPVid-3D Paper` `Colab to Visualize Samples`

TAPVid-3D is a dataset and benchmark for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D).

The benchmark features 4,000+ real-world videos, along with their metric 3D position point trajectories and camera extrinsics. The dataset is contains three different video sources, and spans a variety of object types, motion patterns, and indoor and outdoor environments. This repository folder contains the code to download and generate these annotations and dataset samples to view.

Note that in order to use the dataset, you must accept the licenses of the constituent original video data sources that you use! In particular, you must adhere to the terms of service outlined in:

To measure performance on the TAP-3D task, we formulated metrics that extend the Jaccard-based metric used in 2D TAP to handle the complexities of ambiguous depth scales across models, occlusions, and multi-track spatio-temporal smoothness. Our implementation of these metrics (3D-AJ, APD, OA) can be found in tapvid3d_metrics.py.

For more details, including the performance of multiple baseline 3D point tracking models on the benchmark, please see the paper: TAPVid-3D:A Benchmark for Tracking Any Point in 3D.

Getting Started: Installing

(If you want to generate the dataset and want to use CUDA for running semantic segmentation, first install a CUDA-enabled PyTorch with pip3 install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118)

For generating the dataset, install the Tracking Any Point repo with:

pip install "git+https://github.com/google-deepmind/tapnet.git"[tapvid3d_eval,tapvid3d_generation]

If you only want to use the metrics, install with:

pip install "git+https://github.com/google-deepmind/tapnet.git"[tapvid3d_eval]

For a local editable installation, clone the repo and use:

pip install -e .[tapvid3d_eval,tapvid3d_generation]

pip install -e .[tapvid3d_eval].

How to Download and Generate the Dataset

Scripts to download and generate the annotations of each of the datasets can be found in the annotation_generation subdirectory, and can be run as:

python3 -m tapnet.tapvid3d.annotation_generation.generate_adt --help
python3 -m tapnet.tapvid3d.annotation_generation.generate_pstudio --help
python3 -m tapnet.tapvid3d.annotation_generation.generate_drivetrack --help

Because of license restrictions in distribution of the underlying source videos for Aria Digital Twin, you will need to accept their licence terms and download the ADT dataset by first getting the cdn json file from Project Aria Explorer, and downloading the ADT main_vrs, main_groundtruth, segmentation and depth files with:

aria_dataset_downloader --cdn_file /PATH_TO/Aria_Digital_Twin_1720774989.json -o /OUTPUT_PATH -d 0 6 7 8

To run all generation scripts, follow the instructions and run the commands in generate_all.sh. This will generate all the *.npz files, and place them into a new folder, tapvid3d_dataset. To generate only the minival split, replace --split=all in this script with --split=minival.

To test things are working before full generation, you can run ./run_all.sh --debug. This runs generates exactly one *.npz/video annotation per data source.

Data format

Once the benchmark files are fully generated, you will have roughly 4,500 *.npz files, each one with exactly one dataset video+annotation. Each *.npz file, contains:

images_jpeg_bytes: tensor of shape [# of frames, height, width, 3], each frame stored as JPEG bytes that must be decoded
intrinsics: (fx, fy, cx, cy) camera intrinsics of the video
tracks_xyz: tensor of shape (# of frames, # of point tracks, 3), representing the 3D point trajectories and the last dimension is the (x, y, z) point position in meters.
visibility: tensor of shape (# of frames, # of point tracks), representing the visibility of each point along its trajectory
queries_xyt: tensor of shape (# of point tracks, 3), representing the query point used in the benchmark as the initial given point to track. The last dimension is given in (x, y, t), where x,y is the pixel location of the query point and t is the query frame.
extrinsics_w2c: for videos with a moving camera (videos from the Waymo Open Dataset and ADT), we provide camera extrinsics in the form of a world to camera transform, a 4x4 matrix consisting of the rotation matrix and translation matrix. This field is NOT present in the pstudio *.npz files, because in Panoptic Studio, the camera is static.

Getting Started: Evaluating Your Own 3D Tracking Model

To get started using this dataset to benchmark your own 3D tracking model, check out evaluate_model.py. This script reads the ground truth track prediction, and predicted tracks and computes the TAPVid-3D evaluation metrics for those predictions. An example of running this is provided in run_evaluate_model.sh, which computes metrics for our precomputed outputs from SpatialTracker on the minival split, for only the DriveTrack subset. evaluate_model.py takes as input a folder with subdirectories (corresponding to up to 3 data sources [among PStudio/DriveTrack/ADT]). Each subdirectory should contain a list of *.npz files (one per TAPVid-3D video, each containing tracks_XYZ and visibility in the format above). Running this will return all TAP-3D metrics described in the paper (these are implemented in tapvid3d_metrics.py).

Visualizing Samples in Colab

You can view samples of the dataset, using a public Colab demo: .

Citing this Work

If you find this work useful, consider citing the manuscript:

@inproceedings{koppula2024tapvid3d,
   title={TAPVid-3D: A Benchmark for Tracking Any Point in 3D},
   author={Skanda Koppula and Ignacio Rocco and Yi Yang and Joe Heyward and João Carreira and Andrew Zisserman and Gabriel Brostow and Carl Doersch},
   booktitle={NeurIPS},
   year={2024},
}

License and Disclaimer

All software here is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at:

https://www.apache.org/licenses/LICENSE-2.0

All other materials, excepting the source videos and artifacts used to generate annotations, are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

The Aria Digital Twin, Waymo Open Dataset, and Panoptic Studio datasets all have individual usage terms and licenses that must be adhered to when using any software or materials from here.

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Tracking Any Point in 3D (TAPVid-3D)

`TAPVid-3D Website` `TAPVid-3D Paper` `Colab to Visualize Samples`

Getting Started: Installing

How to Download and Generate the Dataset

Data format

Getting Started: Evaluating Your Own 3D Tracking Model

Visualizing Samples in Colab

Citing this Work

License and Disclaimer

FilesExpand file tree

tapvid3d

Directory actions

More options

Directory actions

More options

Latest commit

History

tapvid3d

Folders and files

parent directory

README.md

Tracking Any Point in 3D (TAPVid-3D)

TAPVid-3D Website TAPVid-3D Paper Colab to Visualize Samples

Getting Started: Installing

How to Download and Generate the Dataset

Data format

Getting Started: Evaluating Your Own 3D Tracking Model

Visualizing Samples in Colab

Citing this Work

License and Disclaimer

`TAPVid-3D Website` `TAPVid-3D Paper` `Colab to Visualize Samples`