Bowen Xue (薛博文)

I am a third-year undergraduate at USTC, currently a research intern with Prof. Jiajun Wu at Stanford University. Previously, I interned at ByteDance, MIT HAN Lab, and Tencent. I explore visual generation and am actively seeking Fall 2027 PhD opportunities.

🎯 Research Interests

Visual Generation: Image Generation, Video Generation, World Models, Efficient Visual Generation.

🔍 Goal

Making visual generation controllable as well as impressive. I study how to translate human intent into visual content faithfully, efficiently, and at scale.

📰 News

Jun 2026
🎓 Started as a research intern in Prof. Jiajun Wu's group at Stanford University.
May 2026
🎉 FourTune was accepted by ICML 2026! More details coming soon.
Feb 2026
🎉 Stand-In was accepted by CVPR 2026! See you in Denver!
Sep 2025
💼 Joined ByteDance as a research intern.
Apr 2025
🎓 Started a new research journey as an intern at MIT HAN Lab!
Nov 2024
💼 Worked as a research intern at Tencent.

📝 Publications

Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

Bowen Xue*, Zheng-Peng Duan*, Qixin Yan, Wenjing Wang, Hao Liu, Chun-Le Guo, Chongyi Li, Chen Li, and Jing LYU

CVPR 2026Stand-In trains just 1% of the original model’s parameters with 2,000 video–prompt pairs, yet achieves high-quality identity-preserving video generation.

Project Page Paper Code Models

✨ Behind the Work

This was my first complete research project. Ten days after releasing it on GitHub, Stand-In gained over 500 stars, and my work was accepted by the community. During that period, I almost refreshed the star count every day whenever I had free time, actively responded to issues, and improved the project as planned. Later, when the tide receded, everything almost returned to normal—except that I kept working hard: maintaining the project and preparing for the next, more solid one.