website page counter Xiuye's Home

XIUYE GU

Image

Hi, I'm a researcher at Google DeepMind. Before that, I was a Google AI Resident, where I was lucky to be advised by Yin Cui and Tsung-Yi Lin and worked on open-vocabulary visual recognition. Previously, I obtained my M.S. in Computer Science at Stanford University, and my B.E. in Computer Science at Zhejiang University. I spent half a year (9/2018-3/2019) and summer 2016 working happily with Prof. Yong Jae Lee at UC Davis. In summer 2018, I interned at TuSimple and worked on 3D point cloud scene flow estimation with Dr. Panqu Wang. At Zhejiang University, I was advised by Prof. Deng Cai.

Currently, I'm interested into video generation and open-vocabulary visual recognition.

Image Image Image Image Image


Publications

Image

Kaiwen Zha, Lijun Yu, Alireza Fathi, David Ross, Cordelia Schmid, Dina Katabi, Xiuye Gu.
Language-Guided Image Tokenization for Generation (TexTok).
CVPR 2025.


[paper]

Image

Dan Kondratyuk*, Lijun Yu*, Xiuye Gu*, José Lezama*, Jonathan Huang, Rachel Hornung, Hartwig Adam, Hassan Akbari, Yair Alon, Vighnesh Birodkar, Yong Cheng, Ming-Chang Chiu, Josh Dillon, Irfan Essa, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, David Ross, Grant Schindler, Mikhail Sirotenko, Kihyuk Sohn, Krishna Somandepalli, Huisheng Wang, Jimmy Yan, Ming-Hsuan Yang, Xuan Yang, Bryan Seybold, Lu Jiang.
VideoPoet: A large language model for zero-shot video generation.
ICML 2024, Best Paper Award, Patent.


[paper]   [website with demos]   [blog]

Image

Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid.
Pixel-Aligned Language Model.
CVPR 2024.


[paper]   [code]

Image

Shuyang Sun, Runjia Li, Philip Torr, Xiuye Gu*, Siyang Li*.
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor.
CVPR 2024.


[paper]   [code]   [website]

Image

Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama.
Photorealistic video generation with diffusion models (W.A.L.T).
ECCV 2024.


[paper]   [website]   [video demo]   [more samples]

Image

Lijun Yu, José Lezama, Nitesh B Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A Ross, Lu Jiang.
Language Model Beats Diffusion--Tokenizer is Key to Visual Generation.
ICLR 2024.


[paper]   [website]

Image

Xiuye Gu, Yin Cui, Jonathan Huang, Abdullah Rashwan, Xuan Yang, Xingyi Zhou, Golnaz Ghiasi, Weicheng Kuo, Huizhong Chen, Liang-Chieh Chen, David A Ross.
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model.
NeurIPS 2023.


[paper]   [Objects365 instance segmentation dataset]   [poster]

Image

Xuan Yang, Liangzhe Yuan, Kimberly Wilber, Astuti Sharma, Xiuye Gu, Siyuan Qiao, Stephanie Debats, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Liang-Chieh Chen.
PolyMaX: General Dense Prediction with Mask Transformer.
WACV 2024.


[paper]   [supp]

Image

James Urquhart Allingham, Jie Ren, Michael W Dusenberry, Jeremiah Zhe Liu, Xiuye Gu, Yin Cui, Dustin Tran, Balaji Lakshminarayanan.
A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models.
ICML 2023, Patent.


[paper]

Image

Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova.
F-vlm: Open-vocabulary object detection upon frozen vision and language models.
ICLR 2023, Patent.


[paper]   [code]   [website]   [blog]

Image

Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin.
Scaling open-vocabulary image segmentation with image-level labels (OpenSeg).
ECCV 2022.


[paper]   [code]   [colab demo]   [poster]

Image

Xiuye Gu, Tsung-Yi Lin, Weicheng Kuo, Yin Cui.
Open-vocabulary object detection via vision and language knowledge distillation (ViLD).
ICLR 2022.


[paper]   [code]   [colab demo]

Image

Xiuye Gu, Weixin Luo, Michael S. Ryoo, Yong Jae Lee.
Password-conditioned Anonymization and Deanonymization with Face Identity Transformers.
ECCV 2020.


[paper]   [code]   [demo video]
[1-min presentation]   [10-min presentation]

Image

Xiuye Gu, Yijie Wang, Chongruo Wu, Panqu Wang, Yong Jae Lee.
HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-scale Point Clouds.
CVPR 2019.


[paper]   [supp]   [code]   [poster]   [video]

Image

Maheen Rashid, Xiuye Gu, Yong Jae Lee.
Interspecies Knowledge Transfer for Facial Keypoint Detection.
CVPR 2017.


[paper]   [code]   [anotation tool]

Community Services


Projects

Explore Deep Graph Generation


Course project for CS224W: Machine Learning with Graphs.
Explore deep graph generation from two directions:
1) use CNN GANs to model the whole adjacency matrix directly after sorting the nodes;
2) build upon the very recent Graph Recurrent Attention Networks (GRANs), proposed a graph completeness judger network and improved its attention mechanism.

[poster]   [report]   [code]

Image

Simulating and Rendering Explosion


Course project for CS348B: Image Synthesis Techniques.
Extended PBRT to support emissive volumes and openVDB input format.
Self-studied and implemented blackbody radiation, closed-form and delta tracking.
Simulated explosion and flying rubbles using Blender.

[report]   [video]   [code]

Image

License Plate Detection and Character Segmentation


A robust iterative license plate character segmentation algorithm and a license detection system with robust skew and slant correction to improve character segmentation.

[character segmentation report]   [detection report]
[detection survey]   [segmentation survey]   [recognition survey]
[code]   (reports and surveys are in Chinese)

Image

Deep Stereo Matching


Course project for CS231A: Computer Vision, From 3D Reconstruction to Recognition.

[code]   [report]   [supp]

Image