Rui Qian

Rui Qian (钱瑞)

I am a staff research scientist at Apple AI/ML foundation model team. Before joining Apple, I was a research scientist at Google.

I received Ph.D. in Computer Science from Cornell University and Cornell Tech, advised by Prof. Serge Belongie. Prior to Cornell, I received the B.S. in Computer Science with Summa Cum Laude from Peking University.

I'm interested in contributing to industrial-scale generative models, like Manzano, Imagen3, Veo. I am honored to have received the 2023 Google Research PA Tech Impact Award.

Email: [email protected]
[Github] [Google Scholar] [Linkedin]

Industrial Research

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
[Paper]

Apple Foundation Model: Rui Qian, First Author

(Launched@Google I/O 2024)
Imagen 3: Google's highest quality text-to-image model
[AI Blog]

Google GenMedia Team: Rui Qian, core contributor

(Launched@Google I/O 2024)
Veo: Google's most capable video generation model to date with capbility of generating minutes long 1080p resolution videos
[AI Blog]

Google GenMedia Team: Rui Qian, core contributor

Selected Publications

(ICML 2024)
VideoPrism: A foundational visual encoder for video understanding
[AI Blog][Paper]

Google Research: Rui Qian, modeling contributor

(ECCV 2022)
Exploring Fine-grained Audiovisual Categorization
[Paper][Code][Dataset]

Rui Qian, Kimberly Wilber, Hartwig Adam, Oisin Mac Aodha, Serge Belongie, Grant Van Horn

(BMVC 2022)
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
[Paper]

Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

(CVPR 2022)
Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision
[Paper]

Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

(CVPR 2021)
Spatiotemporal Contrastive Video Representation Learning
[Paper][Code]

Rui Qian*, Tianjian Meng*, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

(CVPR 2021, Oral)
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
[Paper][Code]

Golnaz Ghiasi*, Yin Cui*, Aravind Srinivas*, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph

(NeurIPS 2021)
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
[Paper][Code]

Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

(CVPR 2020)
End-to-end Pseudo-LiDAR for Image-Based 3D Object Detection
[Paper][Code]

Rui Qian*, Divyansh Garg*, Yan Wang*, Yurong You*, Serge Belongie, Bharath Hariharan, Mark Campbell, Kilian Weinberger, Wei-Lun Chao

(AAAI 2019, Spotlight)
Weakly Supervised Scene Parsing with Point-Based Distance Metric Learning
[Paper]

Rui Qian, Yunchao Wei, Honghui Shi, Jiachen Li, Jiaying Liu, Thomas Huang

(CVPR 2018, Spotlight)
Attentive Generative Adversarial Network for Raindrop Removal from A Single Image
[Paper][Code][Dataset]

Rui Qian, Robby T. Tan, Wenhan Yang, Jiajun Su, Jiaying Liu

Experiences

Google Research

Research Intern
May 2022 - Aug 2022
Host: Dr. Yin Cui, Dr. Boqing Gong,
Dr. Tsung-Yi Lin, Prof. Ming-Hsuan Yang

Bytedance AI Research

Research Intern
Mar 2019 - Jul 2019
Host: Dr. Ding Liu, Dr. Xiaohui Shen

Microsoft Research

Research Intern
Sept 2018 - Mar 2019
Host: Dr. Stephen Lin

Misc

I really love my workspace at Cornell Tech which has 180 degree view of Manhattan (day and night).

Here is the view from the House at Cornell Tech (summer and winter).

Rui Qian (钱瑞)

Industrial Research

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer [Paper]

(Launched@Google I/O 2024) Imagen 3: Google's highest quality text-to-image model [AI Blog]

(Launched@Google I/O 2024) Veo: Google's most capable video generation model to date with capbility of generating minutes long 1080p resolution videos [AI Blog]

Selected Publications

(ICML 2024) VideoPrism: A foundational visual encoder for video understanding [AI Blog][Paper]

(ECCV 2022) Exploring Fine-grained Audiovisual Categorization [Paper][Code][Dataset]

(BMVC 2022) Exploring Temporal Granularity in Self-Supervised Video Representation Learning [Paper]

(CVPR 2022) Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision [Paper]

(CVPR 2021) Spatiotemporal Contrastive Video Representation Learning [Paper][Code]

(CVPR 2021, Oral) Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation [Paper][Code]

(NeurIPS 2021) VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text [Paper][Code]

(CVPR 2020) End-to-end Pseudo-LiDAR for Image-Based 3D Object Detection [Paper][Code]

(AAAI 2019, Spotlight) Weakly Supervised Scene Parsing with Point-Based Distance Metric Learning [Paper]

(CVPR 2018, Spotlight) Attentive Generative Adversarial Network for Raindrop Removal from A Single Image [Paper][Code][Dataset]