I hold a Master of Science in Computer Science degree from University of Illinois at Urbana-Champaign, where I worked with Mr. Yunze Man and Prof. Liangyan Gui .
I am broadly interested in computer vision, machine learning, and robotics, with a focus on
3D vision and multi-modal learning in open-world settings.
My prior work spans affordance generalization, 3D visual grounding and detection, and object point cloud completion.
My research goal is to develop computer vision techniques for real-world embodied agents and autonomous systems, enhancing their capability to perceive and reason effectively in uncertain environments.
We propose to study Auto-Vocabulary 3D Object Detection (AV3DOD), where the classes are automatically generated for the detected objects without any user input.
Our proposed model D-LISA has a novel vision module that allows for a dynamic number of proposal boxes and extracts features from dynamic viewpoints per scene. Furthermore, we propose a fusion module that is spatially aware with explicit language conditioning.
We propose a hyperspherical module which could be inserted into any existing encoder-decoder structure and consistently improve the point cloud completion result in both single-task and multi-task learning.