Intro
Venue
Contact
Timeline
Awards
Toolkit
Tracks
Evaluation
FAQ
Organizer

Introduction

The RoboWorld Challenge aims to advance embodied world modeling for robotics and autonomous driving, enabling intelligent agents to understand dynamic environments, anticipate future changes, and make safe, effective decisions in complex real-world settings.

Five distinct tracks are designed to push the boundaries of predictive and actionable world modeling. Each track challenges participants to address real-world embodied intelligence demands including instruction following, social compliance, physical plausibility, behavioral validity, and generalization across diverse tasks, environments, and agent embodiments.

The competition provides participants with benchmark resources, datasets, simulators, baselines, and evaluation protocols, supporting the development of novel methods for perception, reasoning, planning, and control. By emphasizing transparency, comparability, and real-world relevance, we seek to foster embodied systems that can model, evaluate, and safely interact with open, dynamic, and human-centered environments.

Challenge Tracks

There are five tracks in the RoboWorld Challenge, with emphasis on the following topics:

     - Track #1: Driving with Language.
     - Track #2: X-Embodied Foundation.
     - Track #3: Social Navigation.
     - Track #4: WorldLens Driving World Model Evaluation.
     - Track #5: Vision-Language Navigation.

For additional implementation details, kindly refer to our DriveBench, MiMo-Embodied, Falcon, WorldLens, and HA-VLN 2.0 projects.

E-mail: roboworld2026@outlook.com.

Venue

The RoboWorld Challenge is affiliated with the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026).

IROS is IEEE Robotics and Automation Society's flagship conference. IROS 2026 will be held from September 27th to October 1st, 2026, in Pittsburgh, PA, USA.

The IROS competitions provide a unique venue for state-of-the-art technical demonstrations from research labs throughout academia and industry. For additional details, kindly refer to the IROS 2026 website.

Contact

E-mail

Twitter

WeChat

Slack

Zhihu

Timeline

Team Up

Register for your team by filling in this Google Form.
Release of Training and Evaluation Data

Download the data from the competition toolkit.
Competition Servers Online @ CodaBench
Phase One Deadline

Shortlisted teams are invited to participate in the next phase.
Phase Two Deadline

Don't forget to include the code link in your submissions.
Award Decision Announcement

Associated with the IROS 2026 conference formality.

Awards

1st Place

Cash $ 5000 + Certificate

This award will be given to five awardees; an amount of $ 1000 will be given to each track.

2nd Place

Cash $ 3000 + Certificate

This award will be given to five awardees; an amount of $ 600 will be given to each track.

3rd Place

Cash $ 2000 + Certificate

This award will be given to five awardees; an amount of $ 400 will be given to each track.

Innovative Award

Certificate

This award will be selected by the program committee and given to ten awardees; two per track.

Toolkit

GitHub

Tech Report

RoboBEV

Place3D

MMDetection3D

DriveBench

Falcon

GeoText-1652

Pi3DET

Competition Tracks

Track #1: Driving with Language

This track challenges participants to develop vision-language models that enhance the robustness of autonomous driving systems under real-world conditions, including sensor corruptions and environmental noises.

Participants are expected to design multimodal language models that fuse driving perception, prediction, and planning with natural language understanding, enabling the vehicle to make accurate, human-like decisions under all kinds of real-world sensing conditions.

Kindly refer to this page for more technical details on this track.

Track Organizers

Track #2: X-Embodied Foundation

This track challenges participants to develop RGBD-based perception and navigation systems that enable autonomous agents to interact safely, efficiently, and socially in dynamic human environments.

Participants will design algorithms that interpret human behaviors and contextual cues. Submissions should generate navigation strategies balancing efficiency and social compliance, while addressing challenges like real-time adaptability, occlusion handling, and ethical decision-making.

Kindly refer to this page for more technical details on this track.

Track Organizers

Track #3: Social Navigation

This track challenges participants to develop RGBD-based perception and navigation systems that enable autonomous agents to interact safely, efficiently, and socially in dynamic human environments.

Kindly refer to this page for more technical details on this track.

Track Organizers

Track #4: WorldLens Driving World Model Evaluation

This track aims at the development of models for natural language-guided cross-view image retrieval, specifically for scenarios where input data is captured from drastically different viewpoints, such as aerial (drone or satellite) and ground-level images.

Participants are tasked with designing models that can effectively retrieve corresponding images from large-scale cross-view image databases based on natural language text descriptions, even under the presence of common corruptions such as blurriness, occlusions, or sensory noises.

Kindly refer to this page for more technical details on this track.

Track Organizers

Track #5: Vision-Language Navigation

This track focuses on the development of robust 3D object detectors that can seamlessly adapt across different robot platforms, including vehicles, drones, and quadrupeds.

Participants are expected to develop new adaptation algorithms that can effectively adapt 3D perception tasks, specifically object detection, across three robot platforms that use different sensor configurations and movement dynamics. The models are expected to be trained using vehicle data, and achieve promising performance on drone and quadruped platforms.

Kindly refer to this page for more technical details on this track.

Track Organizers

Evaluation Servers

Track #1

Driving with Language

Facilitating driving perception, prediction, and planning robustness with rich language understanding

Track #2

X-Embodied Foundation

Testing the accuracy and resilience of navigation algorithms in dynamic and unpredictable real-world environments

Track #3

Social Navigation

Testing the accuracy and resilience of navigation algorithms in dynamic and unpredictable real-world environments

Track #4

WorldLens Driving World Evaluation

Assessing the cross-modal image retrieval ability from multiple perspectives for comprehensive scene perception

Track #5

Vision-Language Navigation

Tailored for enhancing the robustness of 3D object detectors across vehicle, drone, and quadruped platforms

FAQs

Please refer to Frequently Asked Questions for more detailed rules and conditions of this competition.

Organizing Team

Challenge Organizers

Lingdong Kong

NUS Computing

Shaoyuan Xie

UC, Irvine

Xiaoshuai Hao

Xiaomi EV

Zeying Gong

HKUST(GZ)

Yangyi Zhong

HKUST(GZ)

Ao Liang

NUS Computing

Yifei Dong

U. of Washington

Linfeng Zhang

Xiaomi EV

Yingbo Tang

Xiaomi EV

Lei Zhou

Xiaomi EV

Rong Li

HKUST(GZ)

Tianshuai Hu

HKUST

Program Committee

Wei Tsang Ooi

NUS Computing

Benoit R. Cottereau

CNRS & IPAL

Lai Xing Ng

A*STAR, I2R

Long Chen

Xiaomi EV

Yuexin Ma

ShanghaiTech

Junwei Liang

HKUST(GZ)

Zhi-Qi Cheng

U. of Washington

Hangjun Ye

Xiaomi EV

Ziwei Liu

NTU, S-Lab

Industry Mentors

Wei Yin

Horizon Robotics

Wenhao Ding

NVIDIA

Associated Project

This project is affiliated with DesCartes, a CNRS@CREATE program on Intelligent Modeling for Decision-Making in Critical Urban Systems.

Terms & Conditions

This competition is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:

1. That the data in this competition comes “AS IS”, without express or implied warranty. Although every effort has been made to ensure accuracy, we do not accept any responsibility for errors or omissions.
2. That you may not use the data in this competition or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.
3. That you include a reference to RoboWorld Challenge 2026 (including the benchmark data and the specially generated data for academic challenges) in any work that makes use of the benchmark. For research papers, please cite our preferred publications as listed on our webpage.

To ensure a fair comparison among all participants, we require:

1. All participants must follow the exact same data configuration when training and evaluating their algorithms. Please do not use any public or private datasets other than those specified for model training.
2. The theme of this competition is to probe the out-of-distribution robustness of autonomous driving perception models. Theorefore, any use of the corruption and sensor failure types designed in this benchmark is strictly prohibited, including any atomic operation that is comprising any one of the mentioned corruptions.
3. To ensure the above two rules are followed, each participant is requested to submit the code with reproducible results before the final result is announced; the code is for examination purposes only and we will manually verify the training and evaluation of each participant's model.

Introduction

Venue

Contact

Timeline

Team Up

Release of Training and Evaluation Data

Competition Servers Online @ CodaBench

Phase One Deadline

Phase Two Deadline

Award Decision Announcement

Awards

1st Place

Cash $ 5000 + Certificate

2nd Place

Cash $ 3000 + Certificate

3rd Place

Cash $ 2000 + Certificate

Innovative Award

Certificate

Toolkit

Competition Tracks

Track #1: Driving with Language

Track #2: X-Embodied Foundation

Track #3: Social Navigation

Track #4: WorldLens Driving World Model Evaluation

Track #5: Vision-Language Navigation

Evaluation Servers

Track #1

Driving with Language

Track #2

X-Embodied Foundation

Track #3

Social Navigation

Track #4

WorldLens Driving World Evaluation

Track #5

Vision-Language Navigation

FAQs

Organizing Team

Challenge Organizers

Lingdong Kong

Shaoyuan Xie

Xiaoshuai Hao

Zeying Gong

Yangyi Zhong

Ao Liang

Yifei Dong

Linfeng Zhang

Yingbo Tang

Lei Zhou

Rong Li

Tianshuai Hu

Program Committee

Wei Tsang Ooi

Benoit R. Cottereau

Lai Xing Ng

Long Chen

Yuexin Ma

Junwei Liang

Zhi-Qi Cheng

Hangjun Ye

Ziwei Liu

Industry Mentors

Wei Yin

Wenhao Ding

Associated Project

Terms & Conditions