Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

Yicheng Feng^1,3, Wanpeng Zhang^1,3, Ye Wang^2,3, Hao Luo^1,3, Haoqi Yuan^1,3,
Sipeng Zheng³, Zongqing Lu^1,3†

¹Peking University ²Renmin University of China ³BeingBeyond

VIPA-VLA learns 2D–to–3D visual–physical grounding from human videos with spatial-aware VLA pretraining, enabling robot policies with stronger spatial understanding and generalization.

News

[2025-12-15]: We publish VIPA-VLA! Check our paper here. Code is coming soon! 🔥🔥🔥

Citation

If you find our work useful, please consider citing us and give a star to our repository! 🌟🌟🌟

@article{feng2025vipa,
  title={Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos},
  author={Feng, Yicheng and Zhang, Wanpeng and Wang, Ye and Luo, Hao and Yuan, Haoqi and  Zheng, Sipeng and Lu, Zongqing},
  journal={arXiv preprint arXiv:2512.13080},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

News

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

BeingBeyond/VIPA-VLA

Folders and files

Latest commit

History

Repository files navigation

Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

News

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages