Yicheng Feng1,3, Wanpeng Zhang1,3, Ye Wang2,3, Hao Luo1,3, Haoqi Yuan1,3,
Sipeng Zheng3, Zongqing Lu1,3†
1Peking University
2Renmin University of China
3BeingBeyond
VIPA-VLA learns 2D–to–3D visual–physical grounding from human videos with spatial-aware VLA pretraining, enabling robot policies with stronger spatial understanding and generalization.
- [2025-12-15]: We publish VIPA-VLA! Check our paper here. Code is coming soon! 🔥🔥🔥
If you find our work useful, please consider citing us and give a star to our repository! 🌟🌟🌟
@article{feng2025vipa,
title={Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos},
author={Feng, Yicheng and Zhang, Wanpeng and Wang, Ye and Luo, Hao and Yuan, Haoqi and Zheng, Sipeng and Lu, Zongqing},
journal={arXiv preprint arXiv:2512.13080},
year={2025}
}