RigMo is the first generative framework that discovers both rig structure and motion dynamics directly from raw mesh sequences without ground-truth rigs, skeletons, or per-sequence optimization.
By factorizing deformation into explicit Gaussian bones and structure-aware motion, RigMo turns arbitrary deforming meshes into fully animatable assets: feed-forward, interpretable, and scalable across categories.
Despite significant progress in 4D generation, rig and motion—the core structural and dynamic components of animation—are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent process.
RigMo presents a unified generative framework that jointly learns rig and motion directly from raw mesh sequences, without any human-provided rig annotations. By encoding per-vertex deformations into compact latent spaces, RigMo decodes explicit Gaussian bones and time-varying transformations to create fully animatable meshes.
Experiments on DeformingThings4D, Objaverse-XL, and TrueBones demonstrate that RigMo learns smooth, interpretable, and physically plausible rigs while achieving superior reconstruction and generalization.
@article{zhang2026rigmo,
title={RigMo: Unifying Rig and Motion Learning for Generative Animation},
author={Zhang, Hao and Luo, Jiahao and Wan, Bohui and Zhao, Yizhou and
Li, Zongrui and Vasilkovsky, Michael and Wang, Chaoyang and
Wang, Jian and Ahuja, Narendra and Zhou, Bing},
journal={arXiv preprint arXiv:2601.06378},
year={2026}
}