TRecViT

Welcome to the Google DeepMind Github repository for TRecViT, a novel architecture for efficient video modelling.

TRecViT proposes a new factorisation that relies on gated linear recurrent units (LRUs) for mixing information over time, self-attention for mixing information over space, and MLPs for mixing over channels. This architecture achieves SOTA results on SSv2 when compared to causal models and outperforms or is competitive with non-causal vanilla Transformers on SSv2 and Kinetics400, while having significantly less parameters and smaller memory footprint and FLOPs count.

The model can be trained in supervised or self-supervised regimes (e.g. masked auto-encoding).

TRecViT Paper: https://arxiv.org/abs/2412.14294

Installation

# create environment (requires python>=3.10.0)
python3 -m venv trecvit_env
source trecvit_env/bin/activate

# get repo
git clone https://github.com/google-deepmind/trecvit.git
cd trecvit

pip install .

Test the installation by importing our model:

from trecvit import trecvit_model
model = trecvit_model.get_model()

This module depends on the recurrent_gemma library. It also uses forked files from big_vision.

Usage

Please check the quickstart colab to see how to instantiate the model and run inference.

Pretrained Checkpoints

We release pretrained checkpoints for supervised classification using TRecViT-B on Kinetics400: https://storage.mtls.cloud.google.com/trecvit/model_checkpoints/trecvit_B_k400.npz

And masked-autoencoding: https://storage.googleapis.com/dm-perception-test/trecvit/trecvit_B_mae.npz

Additional checkpoints could be released in the future if requested.

Citing this work

To cite our work, please use:

@misc{pătrăucean2024trecvitrecurrentvideotransformer,
      title={TRecViT: A Recurrent Video Transformer},
      author={Viorica Pătrăucean and Xu Owen He and Joseph Heyward and Chuhan Zhang and Mehdi S. M. Sajjadi and George-Cristian Muraru and Artem Zholus and Mahdi Karami and Ross Goroshin and Yutian Chen and Simon Osindero and João Carreira and Razvan Pascanu},
      year={2024},
      eprint={2412.14294},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.14294},
}

License and disclaimer

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
colabs		colabs
figures		figures
trecvit		trecvit
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TRecViT

Installation

Usage

Pretrained Checkpoints

Citing this work

License and disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

google-deepmind/trecvit

Folders and files

Latest commit

History

Repository files navigation

TRecViT

Installation

Usage

Pretrained Checkpoints

Citing this work

License and disclaimer

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages