Skip to content

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

License

Notifications You must be signed in to change notification settings

BeingBeyond/Being-H

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Image

Blog Paper Models License

Being-H0.5 is a foundational VLA model that scales human-centric learning with unified action space to enable robust cross-embodiment robot control.

being-h05.mp4

(For our previous Being-H0 version, please visit the being-h0 branch.)

News

  • [2026-01-24]: We’ve updated the training, inference, and data configurations along with complete post-training scripts for Being-H0.5. Additionally, post-training data for the PND Adam-U robot is now open-sourced. Download it via our Hugging Face Dataset Collections.
  • [2026-01-20]: We publish the Being-H0.5! Check our Paper for technical details and Hugging Face Model Collections for pretrained and post-trained models. πŸ”₯πŸ”₯πŸ”₯
  • [2025-08-02]: We release the Being-H0 codebase and pretrained models! Check our Hugging Face Model Collections for more details. πŸ”₯πŸ”₯πŸ”₯
  • [2025-07-21]: We publish Being-H0! Check our paper here. 🌟🌟🌟

Model Checkpoints

Download models from Hugging Face Model Collections:

Model Type Model Name Parameters Description
VLA Pretrained Being-H05-2B 2B Base vision-language-action model (preview)
VLA Specialist Being-H05-2B_libero 2B Post-trained on LIBERO benchmark
VLA Specialist Being-H05-2B_robocasa 2B Post-trained on RoboCasa kitchen tasks
VLA Generalist Being-H05-2B_libero_robocasa 2B Post-trained on both LIBERO and RoboCasa

Note: the vision part is 224px by default.

Quick Start

Installation

git clone https://github.com/BeingBeyond/Being-H.git
cd Being-H
conda create -n beingh python=3.10
conda activate beingh
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Inference

from BeingH.inference.beingh_policy import BeingHPolicy

# Load a pre-trained policy
policy = BeingHPolicy(
    model_path="<path-to-checkpoint>",      # Path to Being-H checkpoint
    data_config_name="<config-name>",       # e.g., "libero_nonorm", "robocasa_human"
    dataset_name="<dataset-name>",          # For loading normalization stats
    embodiment_tag="<robot-tag>",           # Robot identifier
    instruction_template="<prompt>",        # Task instruction template
)

# Run inference
actions = policy.get_action(observations)

See docs/inference.md for the complete API reference.

Training

# Single-embodiment training (e.g., LIBERO)
bash scripts/train_libero_example.sh

# Cross-embodiment training (multiple robots)
bash scripts/train_cross_emb_example.sh

Important for cross-embodiment training: Enable --save_merged_metadata True to save hierarchical metadata for inference. See docs/training.md for details.

Supported Robots

Being-H currently provides example configurations for LIBERO and RoboCasa benchmarks. We will gradually release more pre-built configurations for additional robot platforms.

To add your own robot, refer to our example configurations and the Unified Action Space slot layout, then follow the guide in Data Configuration.

Don't see your robot? Open an issue with your robot specs and a data sample - we're happy to help add support.

How It Works: Unified Action Space

Being-H uses a 200-dimensional unified action space that maps different robots to a shared semantic representation. This is what enables cross-embodiment transfer.

The key insight: Similar robot components (e.g., end-effector position) always map to the same dimensions, regardless of the robot type. This allows knowledge to transfer between robots.

For most users, you don't need to understand the details - just use one of the pre-built configurations. For advanced users who want to add custom robots, see the complete documentation:

Unified Action Space Guide - Complete slot layout and configuration examples

Cross-Embodiment Metadata

For cross-embodiment models, Being-H saves metadata during training that is essential for inference. This metadata contains normalization statistics for each task/embodiment.

When running inference on a cross-embodiment model, specify which metadata variant to use:

policy = BeingHPolicy(
    model_path="<path-to-checkpoint>",
    dataset_name="uni_posttrain",              # Cross-embodiment dataset
    metadata_variant="<task-or-embodiment>",   # Select normalization stats
    stats_selection_mode="task",               # "task", "embodiment", or "auto"
    # ... other parameters
)

See docs/inference.md for details.

Documentation

Document Description
Unified Action Space How cross-embodiment transfer works
Data Configuration Adding custom robots and datasets
Training Training parameters and scripts
Inference BeingHPolicy API reference
Evaluation LIBERO and RoboCasa benchmarks

TODO

The following features are planned for future implementation:

  • Out-of-the-box real robot pretrained checkpoints
  • Complete pretraining scripts and documentation
  • Complete post-training scripts for all benchmarks
  • Detailed training and data documentation
  • Benchmark evaluation scripts for all supported tasks

Contributing and Building on Being-H

We encourage researchers and practitioners to leverage Being-H as a foundation for their own experiments and applications. Whether you're adapting Being-H to new robotic platforms, exploring novel manipulation tasks, or extending the model to new domains, our modular codebase is designed to support your innovations. We welcome contributions of all kinds - from bug fixes and documentation improvements to new features and model architectures. By building on Being-H together, we can advance the field of vision-language-action modeling and enable robots to perform more complex and diverse manipulation tasks. Join us in making robotic manipulation more capable, robust, and accessible to all.

Acknowledgments

Being-H builds on the following excellent open-source projects:

  • InternVL: Vision-Language model backbone
  • Bagel: Training framework
  • Qwen: Language model and MoE expert
  • LIBERO: Benchmark for lifelong robot learning
  • RoboCasa: Large-scale simulation benchmark for everyday tasks

We thank the authors for their contributions to the robotics and machine learning communities.

License

Copyright (c) 2026 BeingBeyond Ltd. and/or its affiliates.

SPDX-License-Identifier: Apache-2.0

Citation

If you find our work useful, please consider citing us and give a star to our repository! 🌟🌟🌟

Being-H0.5

@article{beingbeyond2026beingh05,
  title={Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization}, 
  author={Luo, Hao and Wang, Ye and Zhang, Wanpeng and Zheng, Sipeng and Xi, Ziheng and Xu, Chaoyi and Xu, Haiweng and Yuan, Haoqi and Zhang, Chi and Wang, Yiqing and Feng, Yicheng and Lu, Zongqing},
  journal={arXiv preprint arXiv:2601.12993},
  year={2026}
}

Being-H0

@article{beingbeyond2025beingh0,
  title={Being-h0: vision-language-action pretraining from large-scale human videos},
  author={Luo, Hao and Feng, Yicheng and Zhang, Wanpeng and Zheng, Sipeng and Wang, Ye and Yuan, Haoqi and Liu, Jiazheng and Xu, Chaoyi and Jin, Qin and Lu, Zongqing},
  journal={arXiv preprint arXiv:2507.15597},
  year={2025}
}

About

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5