Skip to content

wimmerth/anyup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AnyUp: Universal Feature Upsampling

Thomas Wimmer1,2, Prune Truong3, Marie-Julie Rakotosaona3, Michael Oechsle3, Federico Tombari3,4, Bernt Schiele1 Jan Eric Lenssen1

1Max Planck Institute for Informatics, 2ETH Zurich, 3Google, 4TU Munich

Website arXiv Colab

AnyUp Teaser

Abstract:

We introduce AnyUp, a method for feature upsampling that can be applied to any vision feature at any resolution, without encoder-specific training. Existing learning-based upsamplers for features like DINO or CLIP need to be re-trained for every feature extractor and thus do not generalize to different feature types at inference time. In this work, we propose an inference-time feature-agnostic upsampling architecture to alleviate this limitation and improve upsampling quality. In our experiments, AnyUp sets a new state of the art for upsampled features, generalizes to different feature types, and preserves feature semantics while being efficient and easy to apply to a wide range of downstream tasks.


🔔 News:

[11/25] We added a more efficient (both memory and speed-wise) NATTEN-based version of the window attention module used in AnyUp. You can load the new model by specifying use_natten=True when loading the model from torch.hub. Please note, that this model variant uses slightly different windows than the original AnyUp model, which was used for all experiments in the paper. If you want to use it in your project, you have to install NATTEN in addition to PyTorch. Follow the official install instructions for your CUDA version.

[11/25] We further added multi-backbone training to our codebase, which allows training a single AnyUp model with multiple different feature extractors. This improves generalization to unseen backbones at inference time. To use this feature, please load the pre-trained model with torch.hub.load('wimmerth/anyup', 'anyup_multi_backbone').

[11/25] We added installation of anyup as package for local development. Please see the instructions for installation below.


Use AnyUp to upsample your features!

Upsample features from any model, at any layer without having to retrain the upsampler. It's as easy as this:

import torch
# high-resolution image (B, 3, H, W)
hr_image    = ...
# low-resolution features (B, C, h, w) 
lr_features = ...
# load the AnyUp upsampler model (here we use the NATTEN-based version trained on multiple backbones)
upsampler   = torch.hub.load('wimmerth/anyup', 'anyup_multi_backbone', use_natten=True)
# upsampled high-resolution features (B, C, H, W)
hr_features = upsampler(hr_image, lr_features)

Notes:

  • The hr_image should be normalized to ImageNet mean and std as usual for most vision encoders.
  • The lr_features can be any features from any encoder, e.g. DINO, CLIP, or ResNet.

The hr_features will have the same spatial resolution as the hr_image by default. If you want a different output resolution, you can specify it with the output_size argument:

# upsampled features with custom output size (B, C, H', W')
hr_features = upsampler(hr_image, lr_features, output_size=(H_prime, W_prime))

If you have limited compute resources and run into OOM issues when upsampling to high resolutions, you can use the q_chunk_size argument to trade off speed for memory:

# upsampled features using chunking to save memory (B, C, H, W)
hr_features = upsampler(hr_image, lr_features, q_chunk_size=10)

If you are interested in the attention that is used by AnyUp to upsample the features, we included an optional visualization thereof in the forward pass (only available if use_natten=False):

# matplotlib must be installed to use this feature
# upsampled features and display attention map visualization (B, C, H, W)
hr_features = upsampler(hr_image, lr_features, vis_attn=True)

To use the model proposed in the original AnyUp paper (without NATTEN and trained on a single backbone, DINOv2 ViT-S), load it with

upsampler = torch.hub.load('wimmerth/anyup', 'anyup')

Installation

Install anyup as package for local development:
micromamba create -n anyup python=3.12 -y
micromamba activate anyup
pip install uv

# Install the correct PyTorch version for your CUDA setup, e.g. for CUDA 11.8 and PyTorch=2.9.0:
uv pip install torch==2.9.0 torchvision==0.24.0 --index-url https://download.pytorch.org/whl/cu128
# Install the correct NATTEN version for your CUDA / PyTorch setup, e.g. for CUDA 11.8 and PyTorch=2.9.0:
uv pip install natten==0.21.1+torch290cu128 -f https://whl.natten.org
# Install the remaining dependencies and anyup as package (call from the root of the repository):
uv pip install -e .
Install the required dependencies for training without installing anyup as package:
micromamba create -n anyup python=3.12 -y
micromamba activate anyup
pip install uv

# Install the correct PyTorch version for your CUDA setup, e.g. for CUDA 11.8 and PyTorch=2.9.0:
uv pip install torch==2.9.0 torchvision==0.24.0 --index-url https://download.pytorch.org/whl/cu128
# Install the correct NATTEN version for your CUDA / PyTorch setup, e.g. for CUDA 11.8 and PyTorch=2.9.0:
uv pip install natten==0.21.1+torch290cu128 -f https://whl.natten.org
# Install the remaining dependencies needed for training:
uv pip install einops matplotlib numpy timm plotly tensorboard hydra-core rich scikit-learn

Training your own AnyUp model

If you want to train your own AnyUp model on custom data or with different hyperparameters, you can do so by running the train.py script. We use hydra for configuration management, so you can easily modify hyperparameters in the corresponding config files.

We trained our model on the ImageNet dataset, which you will have to download and put into ./data/imagenet before running the training script. We further use information on the image resolutions in ImageNet, which can be created using the comput_sizes_index.py script. You can also download this file directly from the releases and put it into ./data/cache/train.sizes.tsv.

Evaluation followed the protocols of JAFAR for semantic segmentation and Probe3D for surface normal and depth estimation. Note that we applied a small fix to the probe training in JAFAR (updating LR scheduling to per epoch instead of per iteration). Therefore, we re-ran all experiments with baselines to ensure a fair comparison.

Acknowledgements: We built our implementation on top of the JAFAR repository and thank the authors for open-sourcing their code. Other note-worthy open-source repositories include: LoftUp, FeatUp, and Probe3D.


Citation

If you find our work useful in your research, please cite it as:

@article{wimmer2025anyup,
    title={AnyUp: Universal Feature Upsampling},
    author={Wimmer, Thomas and Truong, Prune and Rakotosaona, Marie-Julie and Oechsle, Michael and Tombari, Federico and Schiele, Bernt and Lenssen, Jan Eric},
    journal={arXiv preprint arXiv:2510.12764},
    year={2025}
}