img-responsive

Introduction


The RoboWorld Challenge aims to advance embodied world modeling for robotics and autonomous driving, enabling intelligent agents to understand dynamic environments, anticipate future changes, and make safe, effective decisions in complex real-world settings.

Five distinct tracks are designed to push the boundaries of predictive and actionable world modeling. Each track challenges participants to address real-world embodied intelligence demands including instruction following, social compliance, physical plausibility, behavioral validity, and generalization across diverse tasks, environments, and agent embodiments.

The competition provides participants with benchmark resources, datasets, simulators, baselines, and evaluation protocols, supporting the development of novel methods for perception, reasoning, planning, and control. By emphasizing transparency, comparability, and real-world relevance, we seek to foster embodied systems that can model, evaluate, and safely interact with open, dynamic, and human-centered environments.

img-responsive

Challenge Tracks

There are five tracks in the RoboWorld Challenge, with emphasis on the following topics:

     - Track #1: Driving with Language.
     - Track #2: X-Embodied Foundation.
     - Track #3: Social Navigation.
     - Track #4: WorldLens Driving World Model Evaluation.
     - Track #5: Vision-Language Navigation.

For additional implementation details, kindly refer to our DriveBench, MiMo-Embodied, Falcon, WorldLens, and HA-VLN 2.0 projects.

E-mail: roboworld2026@outlook.com.



Venue


Image

The RoboWorld Challenge is affiliated with the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026).

IROS is IEEE Robotics and Automation Society's flagship conference. IROS 2026 will be held from September 27th to October 1st, 2026, in Pittsburgh, PA, USA.

The IROS competitions provide a unique venue for state-of-the-art technical demonstrations from research labs throughout academia and industry. For additional details, kindly refer to the IROS 2026 website.



Contact




Timeline


  • Team Up

    Register for your team by filling in this Google Form. 

  • Release of Training and Evaluation Data

    Download the data from the competition toolkit. 

  • Competition Servers Online @ CodaBench

  • Phase One Deadline

    Shortlisted teams are invited to participate in the next phase. 

  • Phase Two Deadline

    Don't forget to include the code link in your submissions. 

  • Award Decision Announcement

    Associated with the IROS 2026 conference formality. 

Awards


1st Place

Cash $ 5000 + Certificate

  • This award will be given to five awardees; an amount of $ 1000 will be given to each track.

2nd Place

Cash $ 3000 + Certificate

  • This award will be given to five awardees; an amount of $ 600 will be given to each track.

3rd Place

Cash $ 2000 + Certificate

  • This award will be given to five awardees; an amount of $ 400 will be given to each track.

Innovative Award

Certificate

  • This award will be selected by the program committee and given to ten awardees; two per track.


Toolkit





Competition Tracks


Track #1: Driving with Language

Image

This track challenges participants to develop vision-language models that enhance the robustness of autonomous driving systems under real-world conditions, including sensor corruptions and environmental noises.

Participants are expected to design multimodal language models that fuse driving perception, prediction, and planning with natural language understanding, enabling the vehicle to make accurate, human-like decisions under all kinds of real-world sensing conditions.

Kindly refer to this page for more technical details on this track.

Track Organizers




Track #2: X-Embodied Foundation

Image

This track challenges participants to develop RGBD-based perception and navigation systems that enable autonomous agents to interact safely, efficiently, and socially in dynamic human environments.

Participants will design algorithms that interpret human behaviors and contextual cues. Submissions should generate navigation strategies balancing efficiency and social compliance, while addressing challenges like real-time adaptability, occlusion handling, and ethical decision-making.

Kindly refer to this page for more technical details on this track.

Track Organizers




Track #3: Social Navigation

Image

This track challenges participants to develop RGBD-based perception and navigation systems that enable autonomous agents to interact safely, efficiently, and socially in dynamic human environments.

Participants will design algorithms that interpret human behaviors and contextual cues. Submissions should generate navigation strategies balancing efficiency and social compliance, while addressing challenges like real-time adaptability, occlusion handling, and ethical decision-making.

Kindly refer to this page for more technical details on this track.

Track Organizers




Track #4: WorldLens Driving World Model Evaluation

Image

This track aims at the development of models for natural language-guided cross-view image retrieval, specifically for scenarios where input data is captured from drastically different viewpoints, such as aerial (drone or satellite) and ground-level images.

Participants are tasked with designing models that can effectively retrieve corresponding images from large-scale cross-view image databases based on natural language text descriptions, even under the presence of common corruptions such as blurriness, occlusions, or sensory noises.

Kindly refer to this page for more technical details on this track.

Track Organizers




Track #5: Vision-Language Navigation

Image

This track focuses on the development of robust 3D object detectors that can seamlessly adapt across different robot platforms, including vehicles, drones, and quadrupeds.

Participants are expected to develop new adaptation algorithms that can effectively adapt 3D perception tasks, specifically object detection, across three robot platforms that use different sensor configurations and movement dynamics. The models are expected to be trained using vehicle data, and achieve promising performance on drone and quadruped platforms.

Kindly refer to this page for more technical details on this track.

Track Organizers





Evaluation Servers


Track #1

Driving with Language

  • Facilitating driving perception, prediction, and planning robustness with rich language understanding

Track #2

X-Embodied Foundation

  • Testing the accuracy and resilience of navigation algorithms in dynamic and unpredictable real-world environments

Track #3

Social Navigation

  • Testing the accuracy and resilience of navigation algorithms in dynamic and unpredictable real-world environments

Track #4

WorldLens Driving World Evaluation

  • Assessing the cross-modal image retrieval ability from multiple perspectives for comprehensive scene perception

Track #5

Vision-Language Navigation

  • Tailored for enhancing the robustness of 3D object detectors across vehicle, drone, and quadruped platforms


FAQs


Please refer to Frequently Asked Questions for more detailed rules and conditions of this competition.




Organizing Team


Challenge Organizers


Image

Lingdong Kong

NUS Logo NUS Computing

Image

Shaoyuan Xie

UCI Logo UC, Irvine

Image

Xiaoshuai Hao

Logo Xiaomi EV

Image

Zeying Gong

HKUST Logo HKUST(GZ)

Image

Yangyi Zhong

HKUST Logo HKUST(GZ)

Image

Ao Liang

NUS Logo NUS Computing

Image

Yifei Dong

UCI Logo U. of Washington

Image

Linfeng Zhang

UMich Logo Xiaomi EV

Image

Yingbo Tang

UMich Logo Xiaomi EV

Image

Lei Zhou

UMich Logo Xiaomi EV

Image

Rong Li

HKUST Logo HKUST(GZ)

Image

Tianshuai Hu

HKUST Logo HKUST


Program Committee


Image

Wei Tsang Ooi

NUS Logo NUS Computing

Image

Benoit R. Cottereau

CNRS Logo CNRS & IPAL

Image

Lai Xing Ng

A*STAR Logo A*STAR, I2R

Image

Long Chen

UMich Logo Xiaomi EV


Image

Yuexin Ma

ShanghaiTech Logo ShanghaiTech

Image

Junwei Liang

HKUST Logo HKUST(GZ)

Image

Zhi-Qi Cheng

UCI Logo U. of Washington

Image

Hangjun Ye

UMich Logo Xiaomi EV


Image

Ziwei Liu

NTU Logo NTU, S-Lab


Industry Mentors


Image

Wei Yin

Logo Horizon Robotics

Image

Wenhao Ding

Logo NVIDIA




Associated Project


This project is affiliated with DesCartes, a CNRS@CREATE program on Intelligent Modeling for Decision-Making in Critical Urban Systems.

img-responsive


Terms & Conditions


This competition is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:

1. That the data in this competition comes “AS IS”, without express or implied warranty. Although every effort has been made to ensure accuracy, we do not accept any responsibility for errors or omissions.
2. That you may not use the data in this competition or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.
3. That you include a reference to RoboWorld Challenge 2026 (including the benchmark data and the specially generated data for academic challenges) in any work that makes use of the benchmark. For research papers, please cite our preferred publications as listed on our webpage.

To ensure a fair comparison among all participants, we require:

1. All participants must follow the exact same data configuration when training and evaluating their algorithms. Please do not use any public or private datasets other than those specified for model training.
2. The theme of this competition is to probe the out-of-distribution robustness of autonomous driving perception models. Theorefore, any use of the corruption and sensor failure types designed in this benchmark is strictly prohibited, including any atomic operation that is comprising any one of the mentioned corruptions.
3. To ensure the above two rules are followed, each participant is requested to submit the code with reproducible results before the final result is announced; the code is for examination purposes only and we will manually verify the training and evaluation of each participant's model.