Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

Joy is dopamine’s handiwork—whether in humans or in robotics.

🗞️ News

2026-01-08: 🤗 We released Robo-Dopamine-GRM-3B model and inference codes.
2025-12-30: ✨ Codes, Dataset and Weights are coming soon! Stay tuned for updates.
2025-12-30: 🔥 We released our Project Page of Robo-Dopamine.

🎯 TODO

Release Robo-Dopamine-GRM-3B model and inference codes.
Release Dopamine-Bench benchmark and evaluation codes. (About 1 week).
Release Robo-Dopamine-GRM-8B model (About 2 week).
Release Robo-Dopamine-GRM-8B-Pro model (About 2 week).
Release full GRM dataset and GRM training codes (About 1 months).
Release data generation pipeline and finetune codes (Maybe 1 months or more).
Release Dopamine-RL training codes for simulator and real-world settings (Maybe 2 months or more).

🤖 Overview

Robo-Dopamine is composed of two core components: (a) Dopamine-Reward Modeling Method -- At the heart of our reward modeling is to build the General Reward Model (GRM), a vision-language model that is prompted with a task description and conditioned on multi-view images of initial, goal, "BEFORE," and "AFTER" states to predict a relative progress or regress hop. To ensure a stable and accurate signal, we employ Multi-Perspective Progress Fusion, which combines incremental, forward-anchored, and backward-anchored predictions into a final fused reward. And (b) Dopamine-RL Training Framework -- The Dopamine-RL framework first adapts the pre-trained GRM to a novel task using a single demonstration, i.e., One-Shot GRM Adaptation. Subsequently, it uses a theoretically-sound Policy-Invariant Reward Shaping method to convert the GRM's dense output into a reward signal that accelerates learning without altering the optimal policy. This approach is universally compatible with a wide range of RL algorithms.

🤗 Model Zoo

Models	Checkpoint	Description
GRM-3B	🤗 tanhuajie2001/Robo-Dopamine-GRM-3B	Full-trained GRM from RoboBrain-2.0-3B
GRM-8B	🤗 Coming soon ...	Full-trained GRM from RoboBrain-2.0-8B
GRM-8B-Pro	🤗 Coming soon ...	Full-trained GRM from RoboBrain-2.5-8B

🛠️ Setup

# clone repo.
git clone https://github.com/FlagOpen/Robo-Dopamine.git
cd Robo-Dopamine

# build conda env.
conda create -n robo-dopamine python=3.10
conda activate robo-dopamine
pip install -r requirements.txt

💡 Simple Inference

1. Example for GRM Incremental-Mode

import os
from examples.inference import GRMInference

model = GRMInference("tanhuajie2001/Robo-Dopamine-GRM-3B")

TASK_INSTRUCTION = "organize the table"
BASE_DEMO_PATH = "./examples/demo_table"
GOAL_IMAGE_PATH = "./examples/demo_table/goal_image.png" 
OUTPUT_ROOT = "./results"

output_dir = model.run_pipeline(
    cam_high_path  = os.path.join(BASE_DEMO_PATH, "cam_high.mp4"),
    cam_left_path  = os.path.join(BASE_DEMO_PATH, "cam_left_wrist.mp4"),
    cam_right_path = os.path.join(BASE_DEMO_PATH, "cam_right_wrist.mp4"),
    out_root       = OUTPUT_ROOT,
    task           = TASK_INSTRUCTION,
    frame_interval = 30,
    batch_size     = 1,
    goal_image     = GOAL_IMAGE_PATH,
    eval_mode      = "incremental",
    visualize      = True
)

print(f"Episode ({BASE_DEMO_PATH}) processed with Incremental-Mode. Output at: {output_dir}")

visualize in reward_vis.mp4

2. Example for GRM Forward-Mode

import os
from examples.inference import GRMInference

model = GRMInference("tanhuajie2001/Robo-Dopamine-GRM-3B")

TASK_INSTRUCTION = "organize the table"
BASE_DEMO_PATH = "./examples/demo_table"
GOAL_IMAGE_PATH = "./examples/demo_table/goal_image.png" 


output_dir = model.run_pipeline(
    cam_high_path  = os.path.join(BASE_DEMO_PATH, "cam_high.mp4"),
    cam_left_path  = os.path.join(BASE_DEMO_PATH, "cam_left_wrist.mp4"),
    cam_right_path = os.path.join(BASE_DEMO_PATH, "cam_right_wrist.mp4"),
    out_root       = OUTPUT_ROOT,
    task           = TASK_INSTRUCTION,
    frame_interval = 30,
    batch_size     = 1,
    goal_image     = GOAL_IMAGE_PATH,
    eval_mode      = "forward",
    visualize      = True
)

print(f"Episode ({BASE_DEMO_PATH}) processed with Forward-Mode. Output at: {output_dir}")

visualize in reward_vis.mp4

3. Example for GRM Backward-Mode

import os
from examples.inference import GRMInference

model = GRMInference("tanhuajie2001/Robo-Dopamine-GRM-3B")

TASK_INSTRUCTION = "organize the table"
BASE_DEMO_PATH = "./examples/demo_table"
GOAL_IMAGE_PATH = "./examples/demo_table/goal_image.png" 
OUTPUT_ROOT = "./results"

output_dir = model.run_pipeline(
    cam_high_path  = os.path.join(BASE_DEMO_PATH, "cam_high.mp4"),
    cam_left_path  = os.path.join(BASE_DEMO_PATH, "cam_left_wrist.mp4"),
    cam_right_path = os.path.join(BASE_DEMO_PATH, "cam_right_wrist.mp4"),
    out_root       = OUTPUT_ROOT,
    task           = TASK_INSTRUCTION,
    frame_interval = 30,
    batch_size     = 1,
    goal_image     = GOAL_IMAGE_PATH,
    eval_mode      = "backward",
    visualize      = True
)

print(f"Episode ({BASE_DEMO_PATH}) processed with Backward-Mode. Output at: {output_dir}")

visualize in reward_vis.mp4

🤖 Pre-Training

Coming soon ...

⚡ Fine-Tuning

Coming soon ...

🔍 Evaluation

Coming soon ...

📑 Citation

If you find our work helpful, feel free to cite it:

@article{tan2025robo,
  title={Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation},
  author={Tan, Huajie and Chen, Sixiang and Xu, Yijie and Wang, Zixiao and Ji, Yuheng and Chi, Cheng and Lyu, Yaoxu and Zhao, Zhongxia and Chen, Xiansheng and Co, Peterson and others},
  journal={arXiv preprint arXiv:2512.23703},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

Joy is dopamine’s handiwork—whether in humans or in robotics.

🗞️ News

🎯 TODO

🤖 Overview

🤗 Model Zoo

🛠️ Setup

💡 Simple Inference

1. Example for GRM Incremental-Mode

2. Example for GRM Forward-Mode

3. Example for GRM Backward-Mode

🤖 Pre-Training

⚡ Fine-Tuning

🔍 Evaluation

📑 Citation

About

Uh oh!

Releases

Packages

License

FlagOpen/Robo-Dopamine

Folders and files

Latest commit

History

Repository files navigation

Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

Joy is dopamine’s handiwork—whether in humans or in robotics.

🗞️ News

🎯 TODO

🤖 Overview

🤗 Model Zoo

🛠️ Setup

💡 Simple Inference

1. Example for GRM Incremental-Mode

2. Example for GRM Forward-Mode

3. Example for GRM Backward-Mode

🤖 Pre-Training

⚡ Fine-Tuning

🔍 Evaluation

📑 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages