RIO: Flexible Real-time Robot I/O for Cross-Embodiment Robot Learning

Cross-embodiment robot learning is bottlenecked by fragmented infrastructure, not just limited data.

Cross-embodiment robot learning is bottlenecked by fragmented infrastructure.

Despite recent progress in training Vision-Language-Action models (VLAs), deploying them across different robots remains a major engineering challenge. Most robot code is highly specific to an exact hardware setup. Reproducing results on a different platform usually means rewriting the entire control stack. VLAs cannot be deployed out-of-the-box on new embodiments, and existing cross-embodiment datasets are aggregations of disjointed collection efforts across fragmented infrastructure.

RIO (Robot I/O) is an open-source Python framework that provides flexible, lightweight components for robot control, teleoperation, data formatting, sensor configuration, and policy deployment across diverse hardware platforms and morphologies. Users can freely choose robots, sensors, teleoperation interfaces, middlewares, data formats, and policies at every layer of the stack — and switch between them with minimal reconfiguration. We validate RIO on VLA deployment workflows across three morphologies (single-arm, bimanual, humanoid) and four hardware platforms, showcasing fine-tuned rollouts of π_0.5 and GR00T N1.5 on household tasks.

RIO

RIO is built on a node-middleware architecture. Nodes for teleoperation interfaces, sensors, robots, and policies are implemented from the same template, requiring minimal boilerplate. Each Node dynamically inherits from a given Middleware that handles message passing. Factory functions produce matched server-client Node pairs, supporting three execution patterns: publish-only (pub()), request-only (req()), and combined (pubreq()). Published data flows through ring buffers that continuously stream state at a fixed frequency, while requests flow through request queues that enable asynchronous command communication from multiple clients at arbitrary rates.

Because Nodes are middleware-agnostic, they can be paired with different backends depending on deployment requirements. Shared memory enables zero-copy data exchange for high-throughput local communication. Zenoh or ZeroRpc handle serialization and transport over TCP/IPC for distributed multi-machine deployments. Thread middleware simplifies debugging by running everything within a single process.

RIO is designed around five tenets: Flexible — agnostic to components, no locked-in choices; Reusable — lightweight building blocks that combine and modify quickly; Accessible — Python-based, single config file, quick to install; Performant — real-time control with asynchronous policy inference; Consistent — scalable, reproducible data collection.

Supported Hardware

RIO provides flexibility across robot hardware, teleoperation interfaces, cameras, and middlewares. These can be combined in any configuration depending on your requirements.

Humanoid Robots	Unitree G1, Booster T1
Robot Arms	UFactory (xArm5/6/7, 850, Lite6), UR (UR5e, UR7e), Franka (FR3, Panda), Kinova (Gen3), SO-100/SO-101
Robot Grippers	UFactory, Franka, Robotiq (2F-85/2F-140), DH-Robotics (AG-105-145)
Teleop Interfaces	Spacemouse, Gamepad, Keyboard, VR (Apple Vision Pro, Meta Quest), Leader-Follower (GELLO), Phone
Cameras	RealSense, ZED, UVC (Webcams, USB), iPhone (Record3D)
Middlewares	Shared Memory, Thread, Portal, Zenoh, ZeroRpc

Framework Comparison

Framework	Humanoids	Bimanual	Single Arm	Grippers	Teleop	Cameras	Middleware(s)	Data Format(s)	Policies
Ark	✔	✔	✔	✘	✔	✔	✘ : LCM	✘ : Pickle	✔
LeRobot	✔	✔	✔	✘	✔	✔	✘ : Threads/gRPC	✘ : LeRobotDataset	✔
ManiUniCon	✘	✘	✔	✘	✔	✔	✘ : Shm	✘ : Zarr	✔
PAPRLE	✔	✔	✔	✔	✔	✔	✘ : ROS	✘ : Pickle	n/a
PyRobot	✘	✘	✔	✔	✔	✔	✘ : ROS	✘ : Pickle	n/a
RCS	✘	✘	✔	✔	✔	✔	✘ : RPC	✘ : Parquet	✔
RoBits	✘	✔	✔	✔	✔	✔	✘ : ZMQ	✘ : NPZ/JSON	n/a
UMI, DP	✘	✔	✔	✘	✔	✔	✘ : Shm	✘ : Zarr	✘ : DP
RIO (ours)	✔	✔	✔	✔	✔	✔	✔ : any	✔ : any	✔

A Minimal Main Loop

RIO's API is designed for simplicity. A complete teleoperation loop fits in a few lines:

from rio import time
from rio.envs.factory import make_env
from rio.middleware import ServerManager

# Factory function to create servers, clients, and environment
servers, clients, env = make_env(cfg)

# Start servers with the desired middleware
with ServerManager(cfg.mw, list(servers.values())):
    # Start clients
    with env, clients["teleop"]() as teleop:

        while True:
            # Query client APIs, all non-blocking
            cmd = teleop.poll()
            action = env.build_action(cmd)
            obs = env.step(action)
            time.precise_wait()

Robot Stations

A composable dataclass configuration specifies the hardware topology for each station. The same application logic operates over arbitrary station configurations without modification.

Example robot station configurations combining different hardware, sensors, and teleoperation interfaces.

Observation Schema

Each robot morphology defines a dedicated observation structure extending a common base schema, ensuring standardized data representation across platforms regardless of the underlying hardware.

@dataclass
class Camera:
    rgb: np.ndarray | None = None
    depth: np.ndarray | None = None
    meta: dict = field(default_factory=dict)

@dataclass
class Observation:
    proprio: np.ndarray  # Defaults to policy action space
    cameras: dict[str, Camera] = field(default_factory=dict)

@dataclass
class Step:
    timestep: int | None
    observation: Observation
    instruction: str | None
    action: np.ndarray | None
    meta: dict | None = field(default_factory=dict)

Results

We deploy state-of-the-art VLAs (π_0.5, GR00T N1.5) across 3 morphologies and 4 hardware platforms, achieving ≥60% success on all tasks with just 50 teleoperated demonstrations.

VLAs (π_0.5 & GROOT)

xArm7 — Place Can

SO-100 — Fold Cloth

SO-100 — Scrub Bowl

Humanoid — GROOT

Diffusion Policy

xArm7 — Throw Ball

xArm7 — Flip Tortilla

RL Navigation (PPO)

Unitree G1 — Walk

Booster T1 — Walk

Policy Deployment

Robot	Policy	Task	Success Rate	Task Time (s)	Demo Time (s)	GPU Util (%)
xArm7	BC π_0.5	Fold Shirt	92.5%	41.96 ± 14.58	41.57 ± 9.25	56.7 ± 1.7
xArm7	BC π_0.5	Place Can	95.0%	16.08 ± 3.41	14.46 ± 2.00	54.6 ± 3.1
SO-100	BC π_0.5	Fold Cloth	60.0%	27.50 ± 5.51	22.43 ± 3.30	46.3 ± 10.0
SO-100	BC π_0.5	Scrub Bowl	64.0%	40.33 ± 13.68	27.66 ± 5.22	52.0 ± 4.8
Unitree G1	BC GR00T N1.5	Pick Box	95.0%	9.07 ± 6.10	10.38 ± 4.04	61.7 ± 4.7
Unitree G1	RL PPO	Navigate	100%	31.27 ± 6.56	n/a	5.1 ± 0.1
Booster T1	RL PPO	Navigate	100%	29.73 ± 4.49	n/a	5.3 ± 0.2

System Profiling

RIO reaches 130.3 ms end-to-end observation-to-action latency versus 581.2 ms for LeRobot — 4.46× lower for π_0.5 inference — with sub-millisecond middleware round-trip times using Zenoh and shared memory.

RIO vs. LeRobot Latency

Observation-Action latency: RIO 130.3 ms vs LeRobot 581.2 ms

We profile end-to-end observation-to-action latency during π_0.5 rollouts with an SO-100 in the loop, using three Intel RealSense cameras (two D415s and one D405) at 640×480 resolution. Under identical hardware, RIO reaches 130.3 ms versus 581.2 ms for LeRobot — a 4.46× reduction. The gain stems from RIO's streamlined architecture: whereas LeRobot threads observations before transmitting them over the network to an asynchronous policy server, RIO leverages the middleware directly for asynchronous inference, cutting both observation fetching and framework overhead. Lower pipeline latency translates to a higher effective control frequency, which is critical for dynamic, contact-rich tasks such as ball throwing and tortilla flipping.

Middleware Round-Trip Latency

Middleware	Latency (ms)
Zenoh	0.43 ± 0.13
Shared Memory	0.54 ± 0.62
Thread	0.99 ± 0.30
ZeroRpc	1.05 ± 0.17
Portal	1.97 ± 0.34

Half of median round-trip time (1st/99th percentiles trimmed, per Open Messaging Benchmark), over 1,000 passes with a 2048-byte payload.

Node Profiling During Policy Deployment

Timeline of π_0.5 rollout on xArm7 with three RealSense cameras (two D415s, one D405) at 640×480. The main loop remains non-blocking; asynchronous inference (~85.8 ms forward pass) allows continuous control.

Get Started with RIO

RIO is designed to be quick to install and easy to use. Check out the repository to get started.

View on GitHub

Team

Pablo Ortega-Kral^*,1, Eliot Xing^*,1, Arthur Fender Coelho Bucker¹, Vernon Luk¹, Jason Kim², Owen Kwon¹, Angchen Xie¹, Nikhil Sobanbabu¹, Yifu Yuan¹, Megan Lee¹, Deepam Ameria¹, Bhaswanth Ayapilla¹, Jaycie Bussell³, Guanya Shi¹, Jonathan Francis^1,3, Jean Oh^†,1,4

¹Carnegie Mellon University, ²Delft University of Technology, ³Bosch Center for AI, ⁴Lavoro AI

^*Equal contribution ^†Corresponding author

BibTeX

@misc{ortega-kral_rio_2026,
  author = {Pablo Ortega-Kral and Eliot Xing and Arthur Bucker and Vernon Luk and Junseo Kim and Owen Kwon and Angchen Xie and Nikhil Sobanbabu and Yifu Yuan and Megan Lee and Deepam Ameria and Bhaswanth Ayapilla and Jaycie Bussell and Guanya Shi and Jonathan Francis and Jean Oh},
  title  = {RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning},
  year   = {2026},
  eprint = {arXiv:2605.11564},
}

RIO

RIO

Flexible Real-time Robot I/O for Cross-Embodiment Robot Learning