Yilun Chen
Research Scientist, Embodied AI Center at Shanghai AI Laboratory
I work on robotic foundation models, 3D vision, and embodied AI. My recent research aims to advance the frontiers of robotic foundation models.
Before joining Shanghai AI Laboratory, I completed my Ph.D. in the Department of Computer Science and Engineering at The Chinese University of Hong Kong, advised by Prof. Jiaya Jia.
Open Positions
We are actively seeking motivated research fellows and interns with experience or strong interest in robotics and 3D vision. If you are interested in robotic foundation models, especially topics such as world models, 3D perception, embodied manipulation, and generalist robot learning, send me an email with your background, interests, and representative work.
Recent Highlights
-
Mar 2026
StarVLA was named a top-10 open-source project in ModelScope's EAI-100 list.
-
Feb 2026
Our team RoboCola placed 2nd out of 62 in the RoCo Challenge at AAAI 2026. See the challenge leaderboard.
-
Feb 2026
Four papers were accepted to ICRA 2026.
-
Jan 2026
Four VLA papers were accepted to ICLR 2026, including ST4VLA, a spatial-training follow-up of InternVLA-M1.
-
Nov 2025
CronusVLA was accepted by AAAI 2026 as an oral presentation.
-
Oct 2025
Co-organized the Workshop and Challenge on Multimodal Robot Learning in Physical Worlds.
-
Sep 2025
Released InternVLA-M1, a spatially guided VLA framework for generalist robots.
-
Oct 2024
PointLLM was selected as an ECCV 2024 Best Paper Candidate.
-
Mar 2026
StarVLA was named a top-10 open-source project in ModelScope's EAI-100 list.
-
Feb 2026
Our team RoboCola placed 2nd out of 62 in the RoCo Challenge at AAAI 2026. See the challenge leaderboard.
-
Feb 2026
Four papers were accepted to ICRA 2026.
-
Jan 2026
Four VLA papers were accepted to ICLR 2026, including ST4VLA, a spatial-training follow-up of InternVLA-M1.
-
Nov 2025
CronusVLA was accepted by AAAI 2026 as an oral presentation.
-
Oct 2025
Co-organized the Workshop and Challenge on Multimodal Robot Learning in Physical Worlds.
-
Sep 2025
Released InternVLA-M1, a spatially guided VLA framework for generalist robots.
-
Mar 2025
GenManip and RoboGround were accepted by CVPR 2025.
-
Oct 2024
PointLLM was selected as an ECCV 2024 Best Paper Candidate.
-
Sep 2024
Three papers were accepted by NeurIPS 2024 and one paper was accepted by CoRL 2024.
-
Jul 2024
One paper was accepted by ECCV 2024.
-
Aug 2023
Code for FocalFormer3D was released.
-
Jul 2023
FocalFormer3D was accepted by ICCV 2023.
-
Mar 2023
FocalFormer3D ranked 1st on the nuScenes LiDAR 3D detection and 3D tracking leaderboards.
-
Sep 2022
One paper was accepted by NeurIPS 2022.
-
Aug 2022
DSGN++ was accepted by T-PAMI 2022 and code was released.
-
Mar 2022
Two papers were accepted by CVPR 2022.
-
Apr 2020
Code for DSGN was released.
-
Mar 2020
DSGN was accepted by CVPR 2020.
-
Jun 2019
Fast Point R-CNN was accepted by ICCV 2019.
-
Feb 2018
CPN was accepted by CVPR 2018.
-
Oct 2017
Won 1st Place in the COCO 2017 Keypoint Challenge.
Selected Publications
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
- Ranked 1st place on the ScanRefer localization benchmark in September 2024.
- Ranked 1st place on the Scan2Cap benchmark in September 2024.
FocalFormer3D: Focusing on Hard Instance for 3D Object Detection
- Ranked 1st place on the nuScenes LiDAR 3D detection leaderboard in March 2023.
- Ranked 1st place on the nuScenes LiDAR 3D tracking leaderboard in March 2023.
DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
- Ranked 1st among camera-based approaches on the KITTI 3D detection leaderboard in November 2021.
- Its multi-modal variant VoCo ranked 1st on the KITTI 3D detection leaderboard for Car in May 2022.
DSGN: Deep Stereo Geometry Network for 3D Object Detection
- Ranked 1st among camera-based approaches on the KITTI 3D detection leaderboard in November 2019.
Cascaded Pyramid Network for Multi-Person Pose Estimation
- Champion of the MS-COCO 2017 Keypoint Detection Challenge.
- Ranked 1st on the COCO keypoint detection leaderboard in October 2017.
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
- Ranked 1st place on the ScanRefer localization benchmark in September 2024.
- Ranked 1st place on the Scan2Cap benchmark in September 2024.
FocalFormer3D: Focusing on Hard Instance for 3D Object Detection
- Ranked 1st place on the nuScenes LiDAR 3D detection leaderboard in March 2023.
- Ranked 1st place on the nuScenes LiDAR 3D tracking leaderboard in March 2023.
DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
- Ranked 1st among camera-based approaches on the KITTI 3D detection leaderboard in November 2021.
- Its multi-modal variant VoCo ranked 1st on the KITTI 3D detection leaderboard for Car in May 2022.
DSGN: Deep Stereo Geometry Network for 3D Object Detection
- Ranked 1st among camera-based approaches on the KITTI 3D detection leaderboard in November 2019.
Cascaded Pyramid Network for Multi-Person Pose Estimation
- Champion of the MS-COCO 2017 Keypoint Detection Challenge.
- Ranked 1st on the COCO keypoint detection leaderboard in October 2017.
Experience
- Shanghai AI Laboratory Research Scientist, Mar. 2023 - Present
-
NVIDIA Research
Research Intern, Jun. 2022 - Feb. 2023
Mentors: Zhiding Yu, Jose M. Alvarez
-
SmartMore Inc.
Research Intern, Mar. 2020 - Jun. 2022
Mentor: Shu Liu
-
Tencent Youtu Lab
Research Intern, Mar. 2018 - Jan. 2020
Mentor: Shu Liu
-
Megvii Face++
Research Intern, Nov. 2016 - Nov. 2017
Mentor: Gang Yu
Education
- The Chinese University of Hong Kong Ph.D., Computer Science and Engineering, 2018 - 2022
- Beihang University Bachelor, Computer Science and Engineering, 2013 - 2017
Service
-
Conference Reviewer
CVPR, ECCV, ICCV, ICLR, NeurIPS, ICML, CoRL, IROS, ICRA
-
Journal Reviewer
T-PAMI, IJCV, RA-L
-
Teaching
CSCI3310, CSCI3180, CSCI1120, ENGG1100