Being-H0.5 is a foundational VLA model that scales human-centric learning with unified action space to enable robust cross-embodiment robot control.
being-h05.mp4
(For our previous Being-H0 version, please visit the being-h0 branch.)
- [2026-01-24]: Weβve updated the training, inference, and data configurations along with complete post-training scripts for Being-H0.5. Additionally, post-training data for the PND Adam-U robot is now open-sourced. Download it via our Hugging Face Dataset Collections.
- [2026-01-20]: We publish the Being-H0.5! Check our Paper for technical details and Hugging Face Model Collections for pretrained and post-trained models. π₯π₯π₯
- [2025-08-02]: We release the Being-H0 codebase and pretrained models! Check our Hugging Face Model Collections for more details. π₯π₯π₯
- [2025-07-21]: We publish Being-H0! Check our paper here. πππ
Download models from Hugging Face Model Collections:
| Model Type | Model Name | Parameters | Description |
|---|---|---|---|
| VLA Pretrained | Being-H05-2B | 2B | Base vision-language-action model (preview) |
| VLA Specialist | Being-H05-2B_libero | 2B | Post-trained on LIBERO benchmark |
| VLA Specialist | Being-H05-2B_robocasa | 2B | Post-trained on RoboCasa kitchen tasks |
| VLA Generalist | Being-H05-2B_libero_robocasa | 2B | Post-trained on both LIBERO and RoboCasa |
Note: the vision part is 224px by default.
git clone https://github.com/BeingBeyond/Being-H.git
cd Being-H
conda create -n beingh python=3.10
conda activate beingh
pip install -r requirements.txt
pip install flash-attn --no-build-isolationfrom BeingH.inference.beingh_policy import BeingHPolicy
# Load a pre-trained policy
policy = BeingHPolicy(
model_path="<path-to-checkpoint>", # Path to Being-H checkpoint
data_config_name="<config-name>", # e.g., "libero_nonorm", "robocasa_human"
dataset_name="<dataset-name>", # For loading normalization stats
embodiment_tag="<robot-tag>", # Robot identifier
instruction_template="<prompt>", # Task instruction template
)
# Run inference
actions = policy.get_action(observations)See docs/inference.md for the complete API reference.
# Single-embodiment training (e.g., LIBERO)
bash scripts/train_libero_example.sh
# Cross-embodiment training (multiple robots)
bash scripts/train_cross_emb_example.shImportant for cross-embodiment training: Enable --save_merged_metadata True to save hierarchical metadata for inference. See docs/training.md for details.
Being-H currently provides example configurations for LIBERO and RoboCasa benchmarks. We will gradually release more pre-built configurations for additional robot platforms.
To add your own robot, refer to our example configurations and the Unified Action Space slot layout, then follow the guide in Data Configuration.
Don't see your robot? Open an issue with your robot specs and a data sample - we're happy to help add support.
Being-H uses a 200-dimensional unified action space that maps different robots to a shared semantic representation. This is what enables cross-embodiment transfer.
The key insight: Similar robot components (e.g., end-effector position) always map to the same dimensions, regardless of the robot type. This allows knowledge to transfer between robots.
For most users, you don't need to understand the details - just use one of the pre-built configurations. For advanced users who want to add custom robots, see the complete documentation:
Unified Action Space Guide - Complete slot layout and configuration examples
For cross-embodiment models, Being-H saves metadata during training that is essential for inference. This metadata contains normalization statistics for each task/embodiment.
When running inference on a cross-embodiment model, specify which metadata variant to use:
policy = BeingHPolicy(
model_path="<path-to-checkpoint>",
dataset_name="uni_posttrain", # Cross-embodiment dataset
metadata_variant="<task-or-embodiment>", # Select normalization stats
stats_selection_mode="task", # "task", "embodiment", or "auto"
# ... other parameters
)See docs/inference.md for details.
| Document | Description |
|---|---|
| Unified Action Space | How cross-embodiment transfer works |
| Data Configuration | Adding custom robots and datasets |
| Training | Training parameters and scripts |
| Inference | BeingHPolicy API reference |
| Evaluation | LIBERO and RoboCasa benchmarks |
The following features are planned for future implementation:
- Out-of-the-box real robot pretrained checkpoints
- Complete pretraining scripts and documentation
- Complete post-training scripts for all benchmarks
- Detailed training and data documentation
- Benchmark evaluation scripts for all supported tasks
We encourage researchers and practitioners to leverage Being-H as a foundation for their own experiments and applications. Whether you're adapting Being-H to new robotic platforms, exploring novel manipulation tasks, or extending the model to new domains, our modular codebase is designed to support your innovations. We welcome contributions of all kinds - from bug fixes and documentation improvements to new features and model architectures. By building on Being-H together, we can advance the field of vision-language-action modeling and enable robots to perform more complex and diverse manipulation tasks. Join us in making robotic manipulation more capable, robust, and accessible to all.
Being-H builds on the following excellent open-source projects:
- InternVL: Vision-Language model backbone
- Bagel: Training framework
- Qwen: Language model and MoE expert
- LIBERO: Benchmark for lifelong robot learning
- RoboCasa: Large-scale simulation benchmark for everyday tasks
We thank the authors for their contributions to the robotics and machine learning communities.
Copyright (c) 2026 BeingBeyond Ltd. and/or its affiliates.
SPDX-License-Identifier: Apache-2.0
If you find our work useful, please consider citing us and give a star to our repository! πππ
Being-H0.5
@article{beingbeyond2026beingh05,
title={Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization},
author={Luo, Hao and Wang, Ye and Zhang, Wanpeng and Zheng, Sipeng and Xi, Ziheng and Xu, Chaoyi and Xu, Haiweng and Yuan, Haoqi and Zhang, Chi and Wang, Yiqing and Feng, Yicheng and Lu, Zongqing},
journal={arXiv preprint arXiv:2601.12993},
year={2026}
}Being-H0
@article{beingbeyond2025beingh0,
title={Being-h0: vision-language-action pretraining from large-scale human videos},
author={Luo, Hao and Feng, Yicheng and Zhang, Wanpeng and Zheng, Sipeng and Wang, Ye and Yuan, Haoqi and Liu, Jiazheng and Xu, Chaoyi and Jin, Qin and Lu, Zongqing},
journal={arXiv preprint arXiv:2507.15597},
year={2025}
}