GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation

Teli Ma^1*, Jia Zheng^1*, Zifan Wang¹, Ziyao Gao¹, Jiaming Zhou¹, Junwei Liang^1,2,#

*equal contributions, #corresponding author

¹HKUST(GZ), ²HKUST

Overview

GLOVER++ aims to distill actionable affordance knowledge from rich human videos, and demonstrates the effective transfer as an explicit representation for a variety of manipulation tasks.
We contribute a large-scale affordance-annotated dataset—HOVA-500K, that provides the necessary scale and diversity to learn generalizable affordance representations.
We present GLOVER++, a global-to-local paradigm of affordance training policy based on HOVA-500K, showing fine-grained affordance representation and generalizable affordance reasoning capability. GLOVER++ achieves state-of-the-art performance in the HOVA-500K evaluation benchmark.
Extensive applications in tasks like zero-shot manipulation, multi-task imitation learning, long-horizon and bimanual manipulation demonstrate the huge potential of HOVA-500K and GLOVER++.

HOVA-500K Dataset

We introduce HOVA-500K, a large-scale affordance-annotated dataset constructed from existing human videos and images. The HOVA-500K comprises 500,000 meticulously annotated images spanning 1,726 object categories and 675 action categories, creating a comprehensive taxon- omy of human-object interactions.

Download the HOVA-500K dataset, Use the following command to merge the dataset splits into a single .tar.gz file:

cat HANDAL/part_* > HANDAL.tar.gz
cat Ego4D/part_* > Ego4D.tar.gz
cat epic-100/part_* > epic-100.tar.gz

Uncompress these .tar.gz files and organize them as follows:

├── HOVA-500K
│   ├── 3doi
│   │   ├── GT_gaussian
│   │   └── images
│   ├── Ego4D
│   │   ├── GT_gaussian
│   │   └── frames
│   ├── HANDAL
│   │   └── annotations
│   │   |    ├── GT_gaussian_train
│   │   |    └── GT_gaussian_test
|   │   └── images
│   └── epic-100

Note: the "annotations" files should be put in the same directory as the training code.

Installation

Clone the repository:

git clone https://github.com/TeleeMa/GLOVER.git
cd GLOVER

Install dependencies:

We use Python 3.9

pip install -r requirements.txt

Download pre-trained models:

LISA Plus 7B model
CLIP ViT-L/14 model
SAM ViT-h
Place them in the specified directories and configure the model paths in the training script.

GLOVER/GLOVER++ Method

Training

Basic training command:

bash train_glover.sh

or Advanced training with GLOVER++:

bash train_glover_plus.sh

NOTE: Key training parameters must be set individually:

--version: /path/to/LISA_Plus_7b
--vision-tower: /path/to/clip-vit-large-patch14
--sam_vit_path: /path/to/sam_vit_h_4b8939.pth (only for GLOVER++)
--dataset_dir: /path/to/HOVA-500K datasets

When training is finished, to get the full model weight:

cd ./runs/glover(++)/ckpt_model && python zero_to_fp32.py . ../pytorch_model.bin

Merge LoRA Weights

Merge the LoRA weights of pytorch_model.bin, save the resulting model to your desired path in Hugging Face format:

bash merge_weights.sh

Evaluation

bash eval.sh

NOTE: Key evaluation parameters must be set individually:

--dataset_dir: /path/to/HOVA-500K datasets
--version: /path/to/GLOVER(++) model
--model_arch: Choose from 'glover' or 'glover++'

Inference

bash infer.sh

NOTE: Key inference parameters must be set individually:

--version: Path to GLOVER(++) model
--model_arch: Choose from 'glover' or 'glover++'
--image_path: Path to input image
--objects: Target objects(e.g., 'bottle,cup')
--actions: Target actions(e.g., 'pick up,raise')

Citation

If you find this project useful in your research, please consider citing:

@article{ma2025glover++,
  title={GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation},
  author={Ma, Teli and Zheng, Jia and Wang, Zifan and Gao, Ziyao and Zhou, Jiaming and Liang, Junwei},
  journal={arXiv preprint arXiv:2505.11865},
  year={2025}
}

Acknowledgement

We would like to thank the LISA++ and SAM for their contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
misc		misc
model		model
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
eval.sh		eval.sh
infer.py		infer.py
infer.sh		infer.sh
merge_weights.py		merge_weights.py
merge_weights.sh		merge_weights.sh
requirements.txt		requirements.txt
train_glover.py		train_glover.py
train_glover.sh		train_glover.sh
train_glover_plus.py		train_glover_plus.py
train_glover_plus.sh		train_glover_plus.sh
zero_to_fp32.sh		zero_to_fp32.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation

Overview

HOVA-500K Dataset

Installation

GLOVER/GLOVER++ Method

Training

Merge LoRA Weights

Evaluation

Inference

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

TeleeMa/GLOVER

Folders and files

Latest commit

History

Repository files navigation

GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation

Overview

HOVA-500K Dataset

Installation

GLOVER/GLOVER++ Method

Training

Merge LoRA Weights

Evaluation

Inference

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages