Cross-embodiment robot learning is bottlenecked by fragmented infrastructure,
not just limited data.
Cross-embodiment robot learning is bottlenecked by fragmented infrastructure.
Despite recent progress in training Vision-Language-Action models (VLAs), deploying them across
different robots remains a major engineering challenge. Most robot code is highly specific to an
exact hardware setup. Reproducing results on a different platform usually means rewriting the entire
control stack. VLAs cannot be deployed out-of-the-box on new embodiments, and existing cross-embodiment
datasets are aggregations of disjointed collection efforts across fragmented infrastructure.
RIO (Robot I/O) is an open-source Python framework that provides flexible, lightweight components
for robot control, teleoperation, data formatting, sensor configuration, and policy deployment across
diverse hardware platforms and morphologies. Users can freely choose robots, sensors, teleoperation
interfaces, middlewares, data formats, and policies at every layer of the stack — and switch
between them with minimal reconfiguration. We validate RIO on VLA deployment workflows across three
morphologies (single-arm, bimanual, humanoid) and four hardware platforms, showcasing fine-tuned
rollouts of π0.5 and GR00T N1.5 on household tasks.
RIO
RIO is built on a node-middleware architecture. Nodes for teleoperation interfaces, sensors,
robots, and policies are implemented from the same template, requiring minimal boilerplate. Each Node
dynamically inherits from a given Middleware that handles message passing. Factory functions produce
matched server-client Node pairs, supporting three execution patterns:
publish-only (pub()), request-only (req()), and
combined (pubreq()). Published data flows through ring buffers that continuously
stream state at a fixed frequency, while requests flow through request queues that enable
asynchronous command communication from multiple clients at arbitrary rates.
Because Nodes are middleware-agnostic, they can be paired with different backends depending on
deployment requirements. Shared memory enables zero-copy data exchange for high-throughput
local communication. Zenoh or ZeroRpc handle serialization and transport over TCP/IPC
for distributed multi-machine deployments. Thread middleware simplifies debugging by running
everything within a single process.
RIO is designed around five tenets:
Flexible — agnostic to components, no locked-in choices;
Reusable — lightweight building blocks that combine and modify quickly;
Accessible — Python-based, single config file, quick to install;
Performant — real-time control with asynchronous policy inference;
Consistent — scalable, reproducible data collection.
Supported Hardware
RIO provides flexibility across robot hardware, teleoperation interfaces, cameras, and middlewares.
These can be combined in any configuration depending on your requirements.
RIO's API is designed for simplicity. A complete teleoperation loop fits in a few lines:
from rio import time
from rio.envs.factory import make_env
from rio.middleware import ServerManager
# Factory function to create servers, clients, and environment
servers, clients, env = make_env(cfg)
# Start servers with the desired middleware
with ServerManager(cfg.mw, list(servers.values())):
# Start clients
with env, clients["teleop"]() as teleop:
while True:
# Query client APIs, all non-blocking
cmd = teleop.poll()
action = env.build_action(cmd)
obs = env.step(action)
time.precise_wait()
Robot Stations
A composable dataclass configuration specifies the hardware topology for each station.
The same application logic operates over arbitrary station configurations without modification.
Example robot station configurations combining different hardware, sensors, and teleoperation interfaces.
Observation Schema
Each robot morphology defines a dedicated observation structure extending a common base schema,
ensuring standardized data representation across platforms regardless of the underlying hardware.
@dataclass
class Camera:
rgb: np.ndarray | None = None
depth: np.ndarray | None = None
meta: dict = field(default_factory=dict)
@dataclass
class Observation:
proprio: np.ndarray # Defaults to policy action space
cameras: dict[str, Camera] = field(default_factory=dict)
@dataclass
class Step:
timestep: int | None
observation: Observation
instruction: str | None
action: np.ndarray | None
meta: dict | None = field(default_factory=dict)
Results
We deploy state-of-the-art VLAs (π0.5, GR00T N1.5) across 3 morphologies
and 4 hardware platforms, achieving ≥60% success on all tasks with just 50 teleoperated demonstrations.
VLAs (π0.5 & GROOT)
xArm7 — Place Can
SO-100 — Fold Cloth
SO-100 — Scrub Bowl
Humanoid — GROOT
Diffusion Policy
xArm7 — Throw Ball
xArm7 — Flip Tortilla
RL Navigation (PPO)
Unitree G1 — Walk
Booster T1 — Walk
Policy Deployment
Robot
Policy
Task
Success Rate
Task Time (s)
Demo Time (s)
GPU Util (%)
xArm7
BC π0.5
Fold Shirt
92.5%
41.96 ± 14.58
41.57 ± 9.25
56.7 ± 1.7
xArm7
BC π0.5
Place Can
95.0%
16.08 ± 3.41
14.46 ± 2.00
54.6 ± 3.1
SO-100
BC π0.5
Fold Cloth
60.0%
27.50 ± 5.51
22.43 ± 3.30
46.3 ± 10.0
SO-100
BC π0.5
Scrub Bowl
64.0%
40.33 ± 13.68
27.66 ± 5.22
52.0 ± 4.8
Unitree G1
BC GR00T N1.5
Pick Box
95.0%
9.07 ± 6.10
10.38 ± 4.04
61.7 ± 4.7
Unitree G1
RL PPO
Navigate
100%
31.27 ± 6.56
n/a
5.1 ± 0.1
Booster T1
RL PPO
Navigate
100%
29.73 ± 4.49
n/a
5.3 ± 0.2
System Profiling
RIO reaches 130.3 ms end-to-end observation-to-action latency versus
581.2 ms for LeRobot — 4.46× lower for π0.5
inference — with sub-millisecond middleware round-trip times using Zenoh
and shared memory.
RIO vs. LeRobot Latency
We profile end-to-end observation-to-action latency during π0.5 rollouts with an
SO-100 in the loop, using three Intel RealSense cameras (two D415s and one D405) at
640×480 resolution. Under identical hardware, RIO reaches 130.3 ms versus
581.2 ms for LeRobot — a 4.46× reduction. The gain stems from
RIO's streamlined architecture: whereas LeRobot threads observations before transmitting
them over the network to an asynchronous policy server, RIO leverages the middleware
directly for asynchronous inference, cutting both observation fetching and framework
overhead. Lower pipeline latency translates to a higher effective control frequency,
which is critical for dynamic, contact-rich tasks such as ball throwing and tortilla flipping.
Middleware Round-Trip Latency
Middleware
Latency (ms)
Zenoh
0.43 ± 0.13
Shared Memory
0.54 ± 0.62
Thread
0.99 ± 0.30
ZeroRpc
1.05 ± 0.17
Portal
1.97 ± 0.34
Half of median round-trip time (1st/99th percentiles trimmed, per Open Messaging Benchmark),
over 1,000 passes with a 2048-byte payload.
Node Profiling During Policy Deployment
Timeline of π0.5 rollout on xArm7 with three RealSense cameras
(two D415s, one D405) at 640×480.
The main loop remains non-blocking; asynchronous inference (~85.8 ms forward pass)
allows continuous control.
Get Started with RIO
RIO is designed to be quick to install and easy to use.
Check out the repository to get started.
1Carnegie Mellon University, 2Delft University of Technology, 3Bosch Center for AI, 4Lavoro AI
*Equal contribution †Corresponding author
BibTeX
@misc{ortega-kral_rio_2026,
author = {Pablo Ortega-Kral and Eliot Xing and Arthur Bucker and Vernon Luk and Junseo Kim and Owen Kwon and Angchen Xie and Nikhil Sobanbabu and Yifu Yuan and Megan Lee and Deepam Ameria and Bhaswanth Ayapilla and Jaycie Bussell and Guanya Shi and Jonathan Francis and Jean Oh},
title = {RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning},
year = {2026},
eprint = {arXiv:2605.11564},
}