I am a third-year Ph.D. student in the Electrical and Computer Engineering department at University of Washington, Seattle. Currently, I am a member of NeuroAI Lab advised by Prof. Eli Shlizerman.
My current research interest is multi-modal research for spatial reasoning in 3D scenes, as well as the related applications on multi-modal LLMs, XR devices and robotics. I am grateful to be supported by the Google PhD Fellowship.
Before that, I received my Bachelor’s degree in Computer Science and Technology from the Experimental Program for Exemplary Engineers at Huazhong University of Science and Technology (HUST), China, in 2020. I am also lucky to work with Prof. Chang Wen Chen and Prof. Junsong Yuan,
human detection and segmentation in Bytedance AI Lab,
human-object interaction (HOI) with Prof. Si Liu,
multiple-object tracking (MOT) with Prof. Jenq-Neng Hwang in IPL lab, and 3D photo-realistic digital human rendering
with Prof. Shuicheng Yan
and Prof. Jiashi Feng in Sea AI Lab,
and wonderful multi-modality 3D research in Meta Reality Labs.
I'm open to research collaboration. Please email me (lasiafly [at] uw.edu) if you are interested to explore more on multi-modal for spatial reasoning in 3D together!
[Sep. 2025] One first-author paper on spatial audio-visual LLMs got accepted by NeurIPS as Oral (acceptance rate < 0.4%)!
[Jun. 2025] Join Meta Reality Labs again in Redmond, working on reasoning from multi-modal LLMs to human manipulation from egocentric videos!
[Mar. 2025] One first-author paper on spatial audio-visual reconstruction got accepted by CVPR2025 as Highlight (acceptance rate < 2.9%)!
[Sep. 2024] One first-author paper on spatial audio-visual reconstruction got accepted by NeurIPS2024!
[Jun. 2024] Join Meta Reality Labs in Pittsburgh (now XRCIA Social AI Research group) as a research scientist intern, working with Dr. Israel D. Gebru and Dr. Alexander Richard.
[Mar. 2021] One first-author paper on human-object interaction got accepted by CVPR2021!
[Jul. 2020] Join Sensetime Research as research intern, work on human-object interaction.
[Jun. 2020] My thesis on Language-guided Video Retrieval was awarded with Outstanding undergraduate graduation thesis of Huazhong University of Science and Technology!
[Sep. 2019] Join Bytedance AI Lab as Computer Vision Algorithm Intern.
We present EgoMAN: a large-scale egocentric dataset for stage-aware 3D hand trajectory prediction, alongside a reasoning-to-motion framework that links vision-language reasoning with motion generation via trajectory-token interface, enabling accurate trajectory prediction with strong generalization and efficiency.
We introduce SAVVY-Bench, the first benchmark for 3D spatial reasoning in dynamic audio-visual scenes. We also propose SAVVY, a novel training-free pipeline integrating egocentric spatial tracks and dynamic global maps, which significantly enhances AV-LLM performance for improved audio-visual spatial awareness in such environments.
We introduce SoundVista: a neural network pipeline to generate the ambient sound of arbitrary scene at novel viewpoints, without requiring any constraint or prior knowledge of sound source details. Moreover, our method efficiently adapts to diverse room layouts, reference microphone configurations and unseen environments.
AV-Cloud: Spatial Audio Rendering Through Audio-Visual Cloud Splatting Mingfei Chen,
Eli Shlizerman Neural Information Processing Systems (NeurIPS), 2024
paper
/
bibtex
/
webpage
/
poster
/
video
/
code
AV-Cloud is an audio rendering framework synchronous with the visual perspective. Given video collections, it constructs Audio-Visual Anchors for scene representation and transforms monaural reference sound into spatial audio.
We develop a geometry-guided generalizable and efficient Neural Radiance Field (NeRF) pipeline for high-fidelity free-viewpoint human body synthesis under settings with sparse camera views.
We reformulate HOI detection as an adaptive set prediction problem, with this novel formulation, we propose an Adaptive Set-based one-stage framework (AS-Net) with parallel instances and interaction branches.
Graduate Teaching
[Spring 2022 TA] UW EE 596: Introduction to Deep Learning Applications and Theory (website).
[Lead TA] UW EE 497/498/596/598: Engineering Entrepreneurial Capstone (website).