
The RoboWorld Challenge aims to advance embodied world modeling for robotics and autonomous driving, enabling intelligent agents to understand dynamic environments, anticipate future changes, and make safe, effective decisions in complex real-world settings.
Five distinct tracks are designed to push the boundaries of predictive and actionable world modeling. Each track challenges participants to address real-world embodied intelligence demands including instruction following, social compliance, physical plausibility, behavioral validity, and generalization across diverse tasks, environments, and agent embodiments.
The competition provides participants with benchmark resources, datasets, simulators, baselines, and evaluation protocols, supporting the development of novel methods for perception, reasoning, planning, and control. By emphasizing transparency, comparability, and real-world relevance, we seek to foster embodied systems that can model, evaluate, and safely interact with open, dynamic, and human-centered environments.

Challenge Tracks
There are five tracks in the RoboWorld Challenge, with emphasis on the following topics:
- Track #1: Driving with Language.
- Track #2: X-Embodied Foundation.
- Track #3: Social Navigation.
- Track #4: WorldLens Driving World Model Evaluation.
- Track #5: Vision-Language Navigation.
For additional implementation details, kindly refer to our DriveBench, MiMo-Embodied, Falcon, WorldLens, and HA-VLN 2.0 projects.
E-mail: roboworld2026@outlook.com.
The RoboWorld Challenge is affiliated with the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026).
IROS is IEEE Robotics and Automation Society's flagship conference. IROS 2026 will be held from September 27th to October 1st, 2026, in Pittsburgh, PA, USA.
The IROS competitions provide a unique venue for state-of-the-art technical demonstrations from research labs throughout academia and industry. For additional details, kindly refer to the IROS 2026 website.
Register for your team by filling in this Google Form.
Download the data from the competition toolkit.
Shortlisted teams are invited to participate in the next phase.
Don't forget to include the code link in your submissions.
Associated with the IROS 2026 conference formality.

This track challenges participants to develop vision-language models that enhance the robustness of autonomous driving systems under real-world conditions, including sensor corruptions and environmental noises.
Participants are expected to design multimodal language models that fuse driving perception, prediction, and planning with natural language understanding, enabling the vehicle to make accurate, human-like decisions under all kinds of real-world sensing conditions.
Kindly refer to this page for more technical details on this track.
Track Organizers

This track challenges participants to develop RGBD-based perception and navigation systems that enable autonomous agents to interact safely, efficiently, and socially in dynamic human environments.
Participants will design algorithms that interpret human behaviors and contextual cues. Submissions should generate navigation strategies balancing efficiency and social compliance, while addressing challenges like real-time adaptability, occlusion handling, and ethical decision-making.
Kindly refer to this page for more technical details on this track.
Track Organizers

This track challenges participants to develop RGBD-based perception and navigation systems that enable autonomous agents to interact safely, efficiently, and socially in dynamic human environments.
Participants will design algorithms that interpret human behaviors and contextual cues. Submissions should generate navigation strategies balancing efficiency and social compliance, while addressing challenges like real-time adaptability, occlusion handling, and ethical decision-making.
Kindly refer to this page for more technical details on this track.
Track Organizers

This track aims at the development of models for natural language-guided cross-view image retrieval, specifically for scenarios where input data is captured from drastically different viewpoints, such as aerial (drone or satellite) and ground-level images.
Participants are tasked with designing models that can effectively retrieve corresponding images from large-scale cross-view image databases based on natural language text descriptions, even under the presence of common corruptions such as blurriness, occlusions, or sensory noises.
Kindly refer to this page for more technical details on this track.
Track Organizers

This track focuses on the development of robust 3D object detectors that can seamlessly adapt across different robot platforms, including vehicles, drones, and quadrupeds.
Participants are expected to develop new adaptation algorithms that can effectively adapt 3D perception tasks, specifically object detection, across three robot platforms that use different sensor configurations and movement dynamics. The models are expected to be trained using vehicle data, and achieve promising performance on drone and quadruped platforms.
Kindly refer to this page for more technical details on this track.
Track Organizers
Please refer to Frequently Asked Questions for more detailed rules and conditions of this competition.
This project is affiliated with DesCartes, a CNRS@CREATE program on Intelligent Modeling for Decision-Making in Critical Urban Systems.

This competition is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:
1. That the data in this competition comes “AS IS”, without express or implied warranty. Although every effort has been made to ensure accuracy, we do not accept any responsibility for errors or omissions.
2. That you may not use the data in this competition or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.
3. That you include a reference to RoboWorld Challenge 2026 (including the benchmark data and the specially generated data for academic challenges) in any work that makes use of the benchmark. For research papers, please cite our preferred publications as listed on our webpage.
To ensure a fair comparison among all participants, we require:
1. All participants must follow the exact same data configuration when training and evaluating their algorithms. Please do not use any public or private datasets other than those specified for model training.
2. The theme of this competition is to probe the out-of-distribution robustness of autonomous driving perception models. Theorefore, any use of the corruption and sensor failure types designed in this benchmark is strictly prohibited, including any atomic operation that is comprising any one of the mentioned corruptions.
3. To ensure the above two rules are followed, each participant is requested to submit the code with reproducible results before the final result is announced; the code is for examination purposes only and we will manually verify the training and evaluation of each participant's model.