Human3R: Everyone Everywhere All at Once

Chen, Yue; Chen, Xingyu; Xue, Yuxuan; Chen, Anpei; Xiu, Yuliang; Pons-Moll, Gerard

Computer Science > Computer Vision and Pattern Recognition

arXiv:2510.06219 (cs)

[Submitted on 7 Oct 2025 (v1), last revised 3 Mar 2026 (this version, v2)]

Title:Human3R: Everyone Everywhere All at Once

Authors:Yue Chen, Xingyu Chen, Yuxuan Xue, Anpei Chen, Yuliang Xiu, Gerard Pons-Moll

View PDF

Abstract:We present Human3R, a unified, feed-forward framework for online 4D human-scene reconstruction, in the world frame, from casually captured monocular videos. Unlike previous approaches that rely on multi-stage pipelines, iterative contact-aware refinement between humans and scenes, and heavy dependencies, e.g., human detection, depth estimation, and SLAM pre-processing, Human3R jointly recovers global multi-person SMPL-X bodies ("everyone"), dense 3D scene ("everywhere"), and camera trajectories in a single forward pass ("all-at-once"). Our method builds upon the 4D online reconstruction model CUT3R, and uses parameter-efficient visual prompt tuning, to strive to preserve CUT3R's rich spatiotemporal priors, while enabling direct readout of multiple SMPL-X bodies. Human3R is a unified model that eliminates heavy dependencies and iterative refinement. After being trained on the relatively small-scale synthetic dataset BEDLAM for just one day on one GPU, it achieves superior performance with remarkable efficiency: it reconstructs multiple humans in a one-shot manner, along with 3D scenes, in one stage, in real-time (15 FPS) with a low memory footprint (8 GB). Extensive experiments demonstrate that Human3R delivers state-of-the-art or competitive performance across tasks, including global human motion estimation, local human mesh recovery, video depth estimation, and camera pose estimation, with a single unified model. We hope that Human3R will serve as a simple yet strong baseline, which can be easily adapted for downstream applications. Code, models and 4D interactive demos are available at this https URL.

Comments:	Page: this https URL Code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2510.06219 [cs.CV]
	(or arXiv:2510.06219v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2510.06219
Journal reference:	International Conference on Learning Representations (ICLR), 2026

Submission history

From: Yue Chen [view email]
[v1] Tue, 7 Oct 2025 17:59:52 UTC (19,998 KB)
[v2] Tue, 3 Mar 2026 12:42:36 UTC (32,796 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Human3R: Everyone Everywhere All at Once

Submission history

Access Paper:

Current browse context:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Human3R: Everyone Everywhere All at Once

Submission history

Access Paper:

Current browse context:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators