Shelf-Supervised Mesh Prediction in the Wild

Yufei Ye    Shubham Tulsiani    Abhinav Gupta   

Carnegie Mellon University    Facebook AI Research   

Paper | Video | Code | Bibtex

in CVPR 2021


me
We aim to infer 3D shape and pose from a single image and propose a learning-based approach that can train from unstructured image collections, using only segmentation outputs from off-the-shelf recognition systems as supervisory signal (i.e. 'shelf-supervised'). We first infer a volumetric representation in a canonical frame, along with the camera pose for the input image. We enforce the representation geometrically consistent with both appearance and silhouette, and also that the synthesized novel views are indistinguishable from image collections. Then the coarse volumetric prediction is converted to a mesh-based representation, which is further refined in the predicted camera frame given the input image. These two steps allow both shape-pose factorization from unannotated images and reconstructing per-instance shape in finer details. We report performance on both synthetic and real world datasets. Experiments show that our approach captures category-level 3D shape from image collections more accurately than alternatives, and that this can be further refined by our instance-level specialization.


5min Narrated Video


Method Overview

Image
we first predict a canonical-frame volumetric representation and a camera pose to capture the coarse category-level 3D structure. We then convert this coarse volume to a memory efficient mesh representation which is specialized according to instance-level details.

paper thumbnail

Paper

arxiv, 2021.

Citation

Yufei Ye, Shubham Tulsiani, and Abhinav Gupta.
"Shelf-Supervised Mesh Prediction in the Wild", 2021. [Bibtex]

Code

Pytorch re-implementation



Qualitative Results

OpenImages | Curated Collections | Synthetic Dataset

(click images for full resolution)


OpenImages 50 Categories

First, here is how we get the training set for one category (roughly)...

Image

With the resulting image collections above, we just train a category-specific model and test!

0-GuitarImage Image Image 1-RoseImage Image Image
2-High-heelsImage Image Image 3-FlowerImage Image Image
4-HandbagImage Image Image 5-GoatImage Image Image
6-Coffee-cupImage Image Image 7-EagleImage Image Image
8-GiraffeImage Image Image 9-Sun-hatImage Image Image
10-StarfishImage Image Image 11-CocktailImage Image Image
12-FedoraImage Image Image 13-MotorcycleImage Image Image
14-StrawberryImage Image Image 15-Christmas-treeImage Image Image
16-HatImage Image Image 17-LaptopImage Image Image
18-CattleImage Image Image 19-OrangeImage Image Image
20-SwanImage Image Image 21-CandleImage Image Image
22-Roller-skatesImage Image Image 23-SkateboardImage Image Image
24-BootImage Image Image 25-MushroomImage Image Image
26-Cowboy-hatImage Image Image 27-ChickenImage Image Image
28-MugImage Image Image 29-SurfboardImage Image Image
30-Waste-containerImage Image Image 31-Sofa-bedImage Image Image
32-GoldfishImage Image Image 33-SaxophoneImage Image Image
34-CanoeImage Image Image 35-BagelImage Image Image
36-HorseImage Image Image 37-SkyscraperImage Image Image
38-Bicycle-wheelImage Image Image 39-AirplaneImage Image Image
40-VaseImage Image Image 41-TapImage Image Image
42-OwlImage Image Image 43-Microwave-ovenImage Image Image
44-PigImage Image Image 45-PillowImage Image Image
46-BackpackImage Image Image 47-ToiletImage Image Image
48-BalloonImage Image Image 49-FlowerpotImage Image Image
50-TruckImage Image Image 51-Teddy-bearImage Image Image
52-BeerImage Image Image 53-SpoonImage Image Image
54-BirdImage Image Image

Further, the pretrained category-specific models can be integrated and directly applied on COCO!

Image Image
Image Image
Image Image

See more results on curated (CUB, Quadrupeds, Chairs-in-the-wild) and synthetic (aeroplane, car, chairs) dataset.



Acknowledgements

The authors would like to thank Nilesh Kulkarni for providing segmentation masks of Quadrupeds. We would also like to thank Chen-Hsuan Lin, Chaoyang Wang, Nathaniel Chodosh and Jason Zhang for fruitful discussion and detailed feedback on manuscript. Carnegie Mellon Effort has been supported by DARPA MCS, DARPA SAIL-ON, ONR MURI and ONR YIP. This webpage template was borrowed from some GAN folks.