I am a final year CS Ph.D. student at Brown University working with Prof. Chen Sun. Previously, I also worked at Google and Meta as a research intern. I obtained my bachelor degree in software engineering at Tsinghua University in 2021.
My research interests involve building physically grounded, reasoning-capable vision-language models and exploring their effective integration into the physical world. Feel free to contact me for collaborations and casual chats.
I'm actively looking for industry full-time opportunities in 2026.
Education
09/2021 - NOW Ph.D. in Department of Computer Science, Brown University
08/2016 - 06/2021 B.S. in School of Software, Tsinghua University. (Outstanding Undergrad)
Selected Publications
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding
[paper]
[website]
Ziyang Wang, Honglu Zhou, Shijie Wang, Junnan Li, Caiming Xiong, Silvio Savarese, Mohit Bansal, Michael S. Ryoo, Juan Carlos Niebles CVPR Findings 2026
MotiF: Making Text Count in Image Animation with Motion Focal Loss
[paper]
[website]
[benchmark] Shijie Wang, Samaneh Azadi, Rohit Girdhar, Sai Saketh Rambhatla, Chen Sun, and Xi Yin CVPR 2025
How Can Objects Help Video-Language Understanding?
[paper]
Zitian Tang, Shijie Wang, Junho Cho, Jaewook Yoo, and Chen Sun ICCV 2025
Learning Visual Grounding from Generative Vision and Language Model
[paper] Shijie Wang, Dahun Kim, Ali Taalimi, Chen Sun, and Weicheng Kuo WACV 2025
Vamos: Versatile Action Models for Video Understanding
[paper]
[website]
[code] Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, and Chen Sun ECCV 2024
Do Pre-trained Vision-Language Models Encode Object States?
[paper]
Kaleb Newman, Shijie Wang, Yuan Zang, David Heffren, and Chen Sun ECCV 2024 Workshop EVAL-FoMo
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
[paper]
[website]
[code]
Qi Zhao*, Shijie Wang*, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, and Chen Sun ICLR 2024
Object-centric Video Representation for Long-term Action Anticipation
[paper]
[code]
Ce Zhang*, Changcheng Fu*, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, and Chen Sun WACV 2024
Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning
[paper]
[website]
[code]
Zilai Zeng, Ce Zhang, Shijie Wang, and Chen Sun NeurIPS 2023
Pose Recognition with Cascade Transformers
[paper]
[code]
Ke Li*, Shijie Wang*, Xiang Zhang*, Yifan Xu, Weijian Xu, and Zhuowen Tu CVPR 2021