Yuxi Xiao
肖宇曦
/ˈjuː-ʃee shyao/
I am a third-year Ph.D. student at the State Key Laboratory of CAD&CG , Zhejiang University, advised by Prof. Xiaowei Zhou . Currently, I am also a research intern at ByteDance Seed .
Pronounced like “yoo-shee shyao.”
My recent research focuses on three directions:
Developing 3D/4D foundation models for reconstruction and spatial perception
(see my SpatialTracker series ).
Defining and improving spatial abilities for multimodal large language models with 3D foundation models and principles of cognitive science
(see my SpatialTree series ).
Creating spatial AI agents capable of perceiving, manipulating, and learning from the real physical world.
I am open for collaborations. Please feel free to contact me if you are interested in my research.
Email  / 
Google Scholar  / 
Twitter  / 
Github
Blog
Notes from projects, prototyping diaries, and thoughts on spatial AI. I drop new entries whenever a
project teaches me something worth sharing.
2025 · Technical Report
SpatialTree: How Spatial Abilities Branch Out in MLLMs
Exploring how spatial abilities emerge and branch out in multimodal large language models: A Taxonomy, Benchmark, and Transfer Analysis.
MLLMs
Spatial AI
Foundation Models
Explore the project →
ICCV 2025 · Research
SpatialTrackerV2: 3D Point Tracking Made Easy
Making 3D point tracking practical with foundation priors, handling long videos efficiently,
and lessons learned from building a production-ready tracking system.
3D Tracking
Foundation Models
Computer Vision
Explore the project →
Mar 2025 · Engineering Notes
An Efficient Large-scale Training Codebase for MLLMs
Drafting a scalable training + serving stack: streaming dataloaders, 4D parallelism, Triton
fused kernels, structured logging, and vLLM-style inference.
Training Infra
Triton
vLLM
Read this blog →
Research
(* indicates the equal contribution)
Your browser does not support the video tag.
SpatialTree: How Spatial Abilities Branch Out in MLLMs
Yuxi Xiao * ,
Longfei Li * ,
Shen Yan ,
Xinhang Liu ,
Sida Peng ,
Yunchao Wei
Xiaowei Zhou ,
Bingyi Kang †
Technical Report
project page
Your browser does not support the video tag.
SpatialTrackerV2: 3D Point Tracking Made Easy
Yuxi Xiao ,
Jianyuan Wang ,
Nan Xue ,
Nikita Karaev ,
Iurii Makarov ,
Bingyi Kang ,
Xing Zhu ,
Hujun Bao ,
Yujun Shen ,
Xiaowei Zhou † ,
ICCV , 2025
project page
/
arXiv
/
code
/
Online Demo
Your browser does not support the video tag.
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao * ,
Qianqian Wang * ,
Shangzhan Zhang ,
Nan Xue ,
Sida Peng ,
Yujun Shen †
Xiaowei Zhou † ,
CVPR , 2024 (selected as highlight paper )
project page
/
arXiv
/
code
Your browser does not support the video tag.
Your browser does not support the video tag.
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Hao Ouyang * ,
Qiuyu Wang * ,
Yuxi Xiao * ,
Qingyan Bai ,
Juntao Zhang ,
Kecheng Zheng ,
Xiaowei Zhou ,
Qifeng Chen † ,
Yujun Shen †
CVPR , 2024 (selected as highlight paper )
project page
/
arXiv
/
code
Volumetric Wireframe Parsing from Neural Attraction Fields
Nan Xue ,
Bin Tan ,
Yuxi Xiao ,
Liang Dong,
Gui-Song Xia ,
Tianfu Wu ,
CVPR , 2024
project page
/
arXiv
/
code
Your browser does not support the video tag.
Level-S2 fM: Structure from Motion on Neural Level Set of Implicit Surfaces
Yuxi Xiao ,
Nan Xue ,
Tianfu Wu ,
Gui-Song Xia
CVPR , 2023
project page
/
arXiv
/
code
DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view Structure from Motion
Yuxi Xiao ,
Li Li ,
Xiaodi Li ,
Jian Yao
IROS , 2022
project page
/
arXiv
/
code
Zhejiang University
Sep. 2023 - Present
Phd student
ByteDance Seed
March. 2025 - Present
Top Seed Research Intern
Ant Group
Feb. 2023 - March. 2025
Research Intern
Wuhan University
Sep. 2019 - June. 2023
Bachelor Degree