Prior to this, I obtained my Bachelor's degree in Computer Science and Engineering, from Sichuan University in 2020. Then I received my Master degree in Computer Science and Engineering, at Sun Yat-Sen University in 2023, where I was advised by Prof. Wei-Shi Zheng.
I'm interested in computer vision and robotics. Currently,
I mainly focus on generalizable robotic manipulation, embodied world model.
I am looking for a research internship. If you have suitable opportunities, please feel free to contact me.
❅ 03/2021: One paper was accepted to CVPR 2021. [GHRM]
publications
The purpose of my research is to understand complex human activities and imitate them with real robots. Currently, I focus on leveraging the foundations in video and robotics domains to develop robotic generalists in real-world scenarios.
A zero-shot long-horizon manipulation framework that mimics human long-range activities via demonstrations and achieves robust execution via generating visual future.
The first weakly supervised framework for developing efficient end-to-end action recognition models on long videos, which gives birth to a new weakly supervised pipeline for downstream long-video tasks.
A large-scale action video description dataset named ActionHub is proposed, which is the first, and the largest dataset that provides millions of video descriptions to describe thousands of human actions.
The first benchmark, named XOV-Action, for the cross-domain open-vocabulary action recognition task, and a simple yet effective method to address the scene bias for the task.