Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation

Ruijie Zhu, Chuxin Wang, Ziyang Song, Li Liu, Tianzhu Zhang, Yongdong Zhang,
University of Science and Technology of China
Arxiv 2024

TL;DR: Finetuning CLIP with 4 RTX 3090 in 8 hours to obtain a robust metric depth estimation model!

Image       Image       Image       Image

Image

Within a unified framework, our method ScaleDepth achieves both accurate indoor and outdoor metric depth estimation without setting depth ranges or finetuning models. Left: the input RGB image and corresponding depth prediction. Right: the comparison of model parameters and performance. With overall fewer parameters, our model ScaleDepth-NK significantly outperforms the state-of-the-art methods under same experimental settings.

Image

Without any finetuning, our model can generalize to scenes with different scales and accurately estimate depth from indoors to outdoors.

Image

The overall architecture of the proposed ScaleDepth. We design bin queries to predict relative depth distribution and scale queries to predict scene scale. During training, we preset text prompts containing 28 scene categories as input to the frozen CLIP text encoder. We then calculate the similarity between the updated scale queries and text embedding, and utilize the scene category as its auxiliary supervision. During inference, only a single image is required to obtain the relative depth and scene scale, thereby synthesizing a metric depth map.

Installation

Please refer to get_started.md for installation and dataset_prepare.md for dataset preparation.

You may also need to install these packages:

pip install "mmdet>=3.0.0rc4"
pip install open_clip_torch
pip install future tensorboard
pip install -r requirements/albu.txt

And download the checkpoint of text embeddings from Google Drive and place it to projects/ScaleDepth/pretrained_weights folder.

Training and Inference

We provide train.md and inference.md for the instruction of training and inference.

Train

# ScaleDepth-N
bash tools/dist_train.sh projects/ScaleDepth/configs/ScaleDepth/scaledepth_clip_NYU_480x480.py 4
# ScaleDepth-K
bash tools/dist_train.sh projects/ScaleDepth/configs/ScaleDepth/scaledepth_clip_KITTI_352x1120.py 4
# ScaleDepth-NK
bash tools/dist_train.sh projects/ScaleDepth/configs/ScaleDepth/scaledepth_clip_NYU_KITTI_352x512.py 4

Test

# ScaleDepth-N
python tools/test.py projects/ScaleDepth/configs/ScaleDepth/scaledepth_clip_NYU_480x480.py work_dirs/scaledepth_clip_NYU_KITTI_352x512/iter_40000.pth
# ScaleDepth-K
python tools/test.py projects/ScaleDepth/configs/ScaleDepth/scaledepth_clip_KITTI_352x1120.py work_dirs/scaledepth_clip_NYU_KITTI_352x512/iter_40000.pth
# ScaleDepth-NK
python tools/test.py projects/ScaleDepth/configs/ScaleDepth/scaledepth_clip_NYU_KITTI_352x512.py work_dirs/scaledepth_clip_NYU_KITTI_352x512/iter_40000.pth

Offical weights

Method Backbone Train Iters Results Config Checkpoint GPUs
ScaleDepth-NK CLIP(ConvNext-Large) 40000 log config iter_40000.pth 4 RTX 3090

Bibtex

If you like our work and use the codebase or models for your research, please cite our work as follows.

@ARTICLE{zhu2024scale,
  title={ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation}, 
  author={Zhu, Ruijie and Wang, Chuxin and Song, Ziyang and Liu, Li and Zhang, Tianzhu and Zhang, Yongdong},
  journal={arXiv preprint arXiv:2407.08187},
  year={2024}
}

Acknowledgement

We thank Jianfeng He and Jiacheng Deng for their thoughtful and valuable suggestions. We thank the authors of Binsformer and Zoedepth for their code.