✨[CVPR 2025 Highlight] FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation
[CVPR 2025] Official repository of "FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation".
[Project Page] [Paper] [Hugging Face]
Authors: Kefan Chen* · Chaerin Min* · Linguang Zhang · Shreyas Hampali · Cem Keskin · Srinath Sridhar
Download FoundHand-10M. The dataset contains processed images and labels from DexYCB, ARCTIC, ReInterHand, InterHand2.6M, Ego4D, EpicKitchensVisor, AssemblyHands, HOI4D, RHD, RenderIH, DART, HAGRID, and WLASL.
FoundHand10M/
├── Arctic/
├── processed/
├── test/
├── train/
├── s01-box_grab_01
├── 00010-6.jpg
├── 00010-6.npz
├── 00011-6.jpg
├── 00011-6.npz
├── ...
├── ...
├── AssemblyHands/
├── processed_seq/
├── val/
├── train/
├── nusar-2021_action_both_9012-c07c_9012_user_id_2021-02-01_164345/
├── 000520-C10095_rgb.jpg
├── 000520-C10095_rgb.npz
├── 000520-C10115_rgb.jpg
├── 000520-C10115_rgb.npz
├── ...
├── ...
Each data sample follows the naming convention of "<frame_id>-<camera_id>.jpg" (image) and "<frame_id>-<camera_id>.npz" (label), where frame_id and camera_id are associated with the same annotation as their original dataset. For non-multiview datasets, camera_id would all be the same, usually set as '0'. Each label file *.npz contains two fields:
'hand_mask': (512, 512) binary mask for hand segmentation.
'kpts': (42, 2) 2D hand keypoints following OpenPose convention, where [:21] indicates the right hand and [21:] indicates the left.
- Create a virtual environment and install necessary dependencies
conda create -n foundhand python=3.9
conda activate foundhand
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
pip install lightning==2.3.0
pip install timm==1.0.7 tqdm opencv-python scikit-image matplotlib tensorboard
git clone [email protected]:arthurchen0518/FoundHand.git
cd FoundHand
pip install -e .We encourage users to try our Hugging Face demo for a more accessible UI. We also provide Jupyter notebook demos to run.
./demos/FixHand.ipynb # Fix malformed AI-generated hand.
./demos/Image2Image.ipynb # Gesture transfer and domain transfer.
./demos/Image2Video.ipynb # Video generation given the first frame and hand motion sequence.
./demos/NVS.ipynb # Novel view synthesis.- Release model weights and code.
- Release demo notebooks.
- Release FoundHand-10M data.
- Release inference code.
- Release training code.
Part of this work was done during Kefan (Arthur) Chen’s internship at Meta Reality Lab. This work was additionally supported by NSF CAREER grant #2143576, NASA grant #80NSSC23M0075, and an Amazon Cloud Credits Award.
This codebase borrows from DiT.
This dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
To view a copy of this license, visit https://creativecommons.org/licenses/by-nc/4.0/.
@InProceedings{Chen_2025_CVPR,
author = {Chen, Kefan and Min, Chaerin and Zhang, Linguang and Hampali, Shreyas and Keskin, Cem and Sridhar, Srinath},
title = {FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {17448-17460}
}
![[Teaser Figure]](https://calendarstemplates.com/arthurchen0518/FoundHand/raw/main/assets/teaser.png)
