Me.

Rui Qian (钱瑞)

I am a staff research scientist at Apple AI/ML foundation model team. Before joining Apple, I was a research scientist at Google.

I received Ph.D. in Computer Science from Cornell University and Cornell Tech, advised by Prof. Serge Belongie. Prior to Cornell, I received the B.S. in Computer Science with Summa Cum Laude from Peking University.

I'm interested in contributing to industrial-scale generative models, like Manzano, Imagen3, Veo. I am honored to have received the 2023 Google Research PA Tech Impact Award.

Industrial Research


Image

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
[Paper]

Apple Foundation Model: Rui Qian, First Author

Image
(Launched@Google I/O 2024)
Imagen 3: Google's highest quality text-to-image model
[AI Blog]

Google GenMedia Team: Rui Qian, core contributor

Image
(Launched@Google I/O 2024)
Veo: Google's most capable video generation model to date with capbility of generating minutes long 1080p resolution videos
[AI Blog]

Google GenMedia Team: Rui Qian, core contributor

Selected Publications


Image
(ICML 2024)
VideoPrism: A foundational visual encoder for video understanding
[AI Blog][Paper]

Google Research: Rui Qian, modeling contributor

Image
(ECCV 2022)
Exploring Fine-grained Audiovisual Categorization
[Paper][Code][Dataset]

Rui Qian, Kimberly Wilber, Hartwig Adam, Oisin Mac Aodha, Serge Belongie, Grant Van Horn

Image
(BMVC 2022)
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
[Paper]

Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

Image
(CVPR 2022)
Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision
[Paper]

Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

Image
(CVPR 2021)
Spatiotemporal Contrastive Video Representation Learning
[Paper][Code]

Rui Qian*, Tianjian Meng*, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

Image
(CVPR 2021, Oral)
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
[Paper][Code]

Golnaz Ghiasi*, Yin Cui*, Aravind Srinivas*, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph

Image
(NeurIPS 2021)
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
[Paper][Code]

Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Image
(CVPR 2020)
End-to-end Pseudo-LiDAR for Image-Based 3D Object Detection
[Paper][Code]

Rui Qian*, Divyansh Garg*, Yan Wang*, Yurong You*, Serge Belongie, Bharath Hariharan, Mark Campbell, Kilian Weinberger, Wei-Lun Chao

Image
(AAAI 2019, Spotlight)
Weakly Supervised Scene Parsing with Point-Based Distance Metric Learning
[Paper]

Rui Qian, Yunchao Wei, Honghui Shi, Jiachen Li, Jiaying Liu, Thomas Huang

Image
(CVPR 2018, Spotlight)
Attentive Generative Adversarial Network for Raindrop Removal from A Single Image
[Paper][Code][Dataset]

Rui Qian, Robby T. Tan, Wenhan Yang, Jiajun Su, Jiaying Liu

Experiences


Image
Google Research

Research Intern
May 2022 - Aug 2022
Host: Dr. Yin Cui, Dr. Boqing Gong,
Dr. Tsung-Yi Lin, Prof. Ming-Hsuan Yang

Image
Bytedance AI Research

Research Intern
Mar 2019 - Jul 2019
Host: Dr. Ding Liu, Dr. Xiaohui Shen

Image
Microsoft Research

Research Intern
Sept 2018 - Mar 2019
Host: Dr. Stephen Lin


Misc


I really love my workspace at Cornell Tech which has 180 degree view of Manhattan (day and night).
Image
Image

Here is the view from the House at Cornell Tech (summer and winter).

Image
Image