This repository is an official implementation for:
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model [CVPR 2025]
Authors: Mingju Gao*, Yike Pan*, Huan-ang Gao*, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, Hao Zhao
As interest grows in world models that predict future states from current observations and actions, accurately modeling part-level dynamics has become increasingly relevant for various applications. Existing approaches, such as Puppet-Master, rely on fine-tuning large-scale pre-trained video diffusion models, which are impractical for real-world use due to the limitations of 2D video representation and slow processing times. To overcome these challenges, we present PartRM, a novel 4D reconstruction framework that simultaneously models appearance, geometry, and part-level motion from multi-view images of a static object. PartRM builds upon large 3D Gaussian reconstruction models, leveraging their extensive knowledge of appearance and geometry in static objects. To address data scarcity in 4D, we introduce the PartDrag-4D dataset, providing multi-view observations of part-level dynamics across over 20,000 states. We enhance the model’s understanding of interaction conditions with a multi-scale drag embedding module that captures dynamics at varying granularities. To prevent catastrophic forgetting during fine-tuning, we implement a two-stage training process that focuses sequentially on motion and appearance learning. Experimental results show that PartRM establishes a new state-of-the-art in part-level motion learning and can be applied in manipulation tasks in robotics. Project page: https://partrm.c7w.tech/
Use conda to create a new virtual enviroment. We use torch==2.1.0+cu121.
conda env create -f environment.yaml
conda activate partrmAlso with gaussian splatting renderer
# a modified gaussian splatting (+ depth, alpha rendering)
git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
pip install ./diff-gaussian-rasterizationYou can download PartDrag-4D dataset from here. And unzip pardrag_4d/partdrag_rendered.zip to PartDrag4D/data/render_PartDrag4D, unzip processed_data_partdrag4d.zip to ./PartDrag4D/data/processed_data_partdrag4d
Below is how to render the PartDrag-4D dataset from scratch.
You need to first get PartNet-Mobility dataset and put it in the PartDrag4D/data directory of this repo.
Then
cd PartDrag4DFor mesh preprocessing and animating:
cd preprocess
python process_data_textured_uv.py
python animated_data.pyFor rendering, first download blender and unzip it in the ../rendering/blender directory:
cd ../rendering/blender
wget https://download.blender.org/release/Blender3.5/blender-3.5.0-linux-x64.tar.xz
tar -xf blender-3.5.0-linux-x64.tar.xzThen generate the rendering filelist and render the generated meshes using blender:
cd ..
python gen_filelist.py
bash render.shYou can modify num_gpus and CUDA_VISIBLE_DEVICES in the bash script to adjust the degree of parallelism.
For surface drags extraction:
cd ..
python z_buffer_al.pyThe animated meshes and extracted surface drags are stored in ./PartDrag4D/data/processed_data_partdrag4d. The rendering results are stored in ./PartDrag4D/data/render_PartDrag4D.
We split the PartDrag-4D dataset into training and evaluation sets. You can refer to ./filelist/train_filelist_partdrag4d.txt and ./filelist/val_filelist_partdrag4d.txt for details.
You can get Zero123++ and SAM checkpoint from here. Then put them into preprocess/zero123_ckpt and preprocess/sam_ckpt respectively.
To generate multi-view images for evaluation data:
cd ../preprocess
python gen_mv_partdrag4d.py --src_filelist /path/to/src/rendering/filelist --output_dir /path/to/save/dir # For PartDrag-4D
python gen_mv_objaverse_hq.py --src_filelist /path/to/src/rendering/filelist --output_dir /path/to/save/dir # For Objaverse-Animation-HQ,The src_filelist is the path to the rendering filelist. You can refer to this filelist for PartDrag4D and this filelist for Objaverse-Animation-HQ for example.
To generate RGBA format images for the input of PartRM:
python gen_rgba.py --filelist /path/to/zero123/filelist --dataset [dataset_name]You can refer to this filelist for PartDrag4D and this filelist for Objaverse-Animation-HQ for example.
To generate propagated drags for PartDrag-4D dataset (You can download our preprocessed propagated drags from here):
python gen_propagate_drags.py --val_filelist /path/to/src/rendering/filelist --sample_num [The number of propagated drags] --save_dir /path/to/save/dragsThe val_filelist is the same as the src_filelist (multi-view images generation) for PartDrag-4D above.
We provide training scripts for PartDrag-4D and Objaverse-Animation-HQ datasets. You can adjust the dataset for training in the train.py and eval.py (partdrag4d or objaverse_hq). Then run:
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --config_file acc_configs/gpu4.yaml train.py big --workspace [your workspace]You should specify the train_filelist, val_filelist, zero123_val_filelist, propagated_drags_base and mesh_base in core/options.py and core/options_pm.py.
-
For
train_filelist, you can refer tofilelist/train_filelist.txtandfilelist/train_objavser_hq.txt. -
For
val_filelist, you can refer tofilelist/val_filelist.txtandfilelist/eval_objaverse_hq.txt. -
For
zero123_val_filelist, you can refer tofilelist/zero123_val_filelist.txtandfilelist/zero123_val_filelist_objavser_hq.txt.
For the 2-stage training proposed in paper, you should first set the stage1 in core/options.py and core/options_pm.py True. After the motion-learning traing, set the stage1 False to conduct the apperance learning training.
For evaluation, you should first run
CUDA_VISIBLE_DEVICES=0 accelerate launch --config_file acc_configs/gpu1.yaml eval.py big --workspace [your workspace]Note you should set the stage1 in core/options.py and core/options_pm.py False.
Then you should generte your eval filelist with every line like
gt_image_path,pred_image_path,source_image_path
The specify the VAL_FILELIST (The path of generated eval filelist) in compute_metrics.py and run:
python compute_metrics.py
You can get the PSNR, LPIPS and SSIM metrics.
We build our work on LGM, Zero123++ and 3D Gaussian Splattings.
