Image
Haoyu Ma
Member of Technical Staff @Microsoft AI

Email : haoyum3 [at] uci [dot] edu

Image Image Image

About Me

I am a Member of Technical Staff at Microsoft AI. Previously, I worked as a Senior Research Scientist at Meta Superintelligence Labs. I earned my PhD from the Donald Bren School of Information and Computer Sciences at UC Irvine in 2024, under the supervision of Prof. Xiaohui Xie. I received my B.Eng. from Southeast University (SEU) in Nanjing, China, in 2019.

My research interests are deep learning and computer vision.

News

  • 12/2025: I joined Microsoft AI as a Member of Technical Staff
  • 09/2025: Mocha is accepted by NeurIPS 2025 (Spotlight).
  • 02/2025: Two papers are accepted by CVPR 2025.
  • 10/2024: We launched Movie Gen. Please see our technical report for more details!
  • 07/2024: We launched Imagine Yourself. Try it in IG, Messenger, and meta.ai
  • 03/2024: I joined GenAI, Meta as a research scientist
  • 03/2024: I earned my PhD degree from University California, Irvine
  • 02/2024: Two papers are accepted by CVPR 2024.
  • 11/2023: One paper is accepted by CPAL 2024.
  • 10/2023: Three papers are accepted by WACV 2024.
  • 11/2022: One paper is accepted by AAAI 2023.
  • 07/2022: Two papers are accepted by ECCV 2022.
  • 05/2022: One paper is accepted by ICML 2022.
  • 03/2022: One paper is accepted by ICLR-PAIR2Struct 2022.
  • 03/2022: One paper is accepted by CVPR 2022.
  • 10/2021: One paper is accepted by BMVC 2021.
  • 05/2021: One paper is accepted by KDD 2021.
  • 01/2021: One paper is accepted by ICLR 2021 (Spotlight).
  • 08/2020: One paper is accepted by BMVC 2020 (Oral).
  • 11/2019: Two papers are accepted by WACV 2020.

Selected Publications [Full List]

Image
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
X. Ma, P. Sun, H. Ma, H. Tang, CY. Ma, J. Wang, K. Li, X. Dai, et al
Preprint, 2025.
[ Paper ] [ Project Page ]
Image
MoCha: Towards Movie-Grade Talking Character Synthesis
C. Wei, B. Sun, H. Ma, J. Hou, F. Xu, Z. He, X. Dai, L. Zhang, K. Li, T. Hou, A. Sinha, P. Vajda, W. Chen
Advances in Neural Information Processing Systems (NeurIPS), 2025. (Spotlight)
[ Paper ] [ Project Page ]
Image
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
F. Liang, H. Ma, Z. He, T. Hou, J. Hou, K. Li, X. Dai, F. Xu, S. Azadi, A. Sinha, P. Zhang, P. Vajda, D. Marculescu
Computer Vision and Pattern Recognition Conference (CVPR), 2025.
[ Paper ] [ Project Page ]
Image
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models
M. Huang, FJ. Chu, B. Tekin, K. Liang, H. Ma, W. Wang, X. Chen, P. Gleize, H. Xue, S. Lyu, K. Kitani, M. Feiszli, H. Tang
Computer Vision and Pattern Recognition Conference (CVPR), 2025.
[ Paper ]
Image
Movie Gen: A Cast of Media Foundation Models
Core Contributor
Preprint, 2024.
[ Paper ] [ Project Page ]
Image
Imagine yourself: Tuning-Free Personalized Image Generation
Z. He , B. Sun , F. Xu, H. Ma, A. Ramchandani, V. Cheung, S. Shah, A. Kalia, N. Zhang, P. Zhang, R. Sumbaly, P. Vajda, A. Sinha
Preprint, 2024.
[ Paper ]
Before PhD graduation
Image
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
H. Ma, S. Mahdizadehaghdam, B. Wu, Z. Fan, Y. Gu, Z. Zhao, L. Shapira and X. Xie
Computer Vision and Pattern Recognition Conference (CVPR), 2024.
[ Paper ] [ Project Page ]
Image
HRBP: Hardware-friendly Regrouping towards Block-based Pruning for Sparse CNN Training
H. Ma, C. Zhang, L. Xiang, X. Ma, G. Yuan, W. Zhang, S. Liu, T. Chen, D. Tao, Y. Wang, Z. Wang, and X. Xie
Conference on Parsimony and Learning (CPAL), 2024.
[ Paper ] [ Code ] [ Slides ]
Image
CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer
H. Ma, T. Zhang, S. Sun, X. Yan, K. Han, and X. Xie
IEEE Winter Conference on Applications of Computer Vision (WACV), 2024.
[ Paper ] [ Code ] [ Poster ] [ Slides ]
Image
Light Field Diffusion for Single-View Novel View Synthesis
Y. Xiong, H. Ma, S. Sun, K. Han, H. Tang, and X. Xie
Arxiv Preprint, 2023.
[ Paper ] [ Project Page ]
Image
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Z. Kong*, H. Ma*, G. Yuan*, M. Sun, Y. Xie, P. Dong, X. Meng, X. Shen, H. Tang, M. Qin, T. Chen, X. Ma, X. Xie, Z. Wang, and Y. Wang
Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2023.
[ Paper ] [ Code ]
Image
PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation
H. Ma, Z. Wang, Y. Chen, D. Kong, L. Chen, X. Liu, X. Yan, H. Tang, and X. Xie
European Conference on Computer Vision (ECCV), 2022.
[ Paper ] [ Code ] [ Poster ]
Image
Training Your Sparse Neural Network Better with Any Mask
A. Jaiswal, H. Ma, T. Chen, Y. Ding, and Z. Wang
International Conference on Machine Learning (ICML), 2022.
[ Paper ] [ Code ]
Image
EI-CLIP: Entity-aware Interventional Contrastive Learning for E-commerce Cross-modal Retrieval
H. Ma, H. Zhao, Z. Lin, A. Kale, Z. Wang, T. Yu, J. Gu, S. Choudhary, and X. Xie
Computer Vision and Pattern Recognition Conference (CVPR), 2022.
[ Paper ] [ Poster ]
Image
Sparse Logits Suffice to Fail Knowledge Distillation
H. Ma, Y. Huang. H. Tang, C. You, D. Kong, and X. Xie
International Conference on Learning Representations (ICLR) Workshop on PAIR^2Struct, 2022
[ Paper ] [ Code ] [ Poster ]
Image
TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation
H. Ma, L. chen, D. Kong, Z. Wang, X. Liu, H. Tang, X. Yan, Y. Xie, S. Lin, and X. Xie
British Machine Vision Virtual Conference (BMVC), 2021.
[ Paper ] [ Code ] [ Presentation ] [ Slides ]
Image
PD-Net: Quantitative Motor Function Evaluation for Parkinson's Disease via Automated Hand Gesture Analysis
Y. Chen, H. Ma, J. Wang, J. Wu, X. Wu, and X. Xie
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2021.
[ Paper ]
Image
Undistillable: Making A Nasty Teacher That CANNOT Teach Students
H. Ma, T. Chen, T. Hu, C. You, X. Xie, and Z. Wang
International Conference on Learning Representations (ICLR), 2021. (Spotlight)
[ Paper ] [ Code ] [ Presentation ] [ Slides ] [ Poster ]
Image
SIA-GCN: A Spatial Information Aware Graph Neural Network with 2D Convolutions for Hand Pose Estimation
D. Kong, H. Ma and X. Xie
British Machine Vision Virtual Conference (BMVC), 2020. (Oral)
[ Paper ]
Image
Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation
Y. Chen*, H. Ma*, D. Kong, X. Yan, J. Wu, W. Fan, and X. Xie
IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.
[ Paper ] [ Code ] [ Presentation ] [ Slides ] [ Poster ]
Image
LSTM Multi-modal UNet for Brain Tumor Segmentation
F. Xu, H. Ma, J. Sun, R. Wu, X. Liu, and Y. Kong
IEEE Conference on Image, Vision and Computing(ICIVC), 2019.
[ Paper ] [ Slides ] [ Code ]

Academic Service

    Conference Reviewer

    CVPR2025, ICCV2025

    CVPR2024, ICML2024, ECCV2024, ICLR2024, CPAL2024, WACV2024

    CVPR2023, ICML2023, ICCV2023, NeurIPS2023

    CVPR2022, ICML2022, ECCV2022, NeurIPS2022, WACV2022, BMVC2022

    BMVC2021

TA/Readers:

COMPSCI 271/171, Introduction to Artificial Intelligence (2020-2024)

COMPSCI 274C, Neural Networks and Deep Learning (2021,2023)

COMPSCI 178, Machine Learning & Data Mining (2020)

Contests

Awards

  • WACV 2024 Doctor Consortium
  • Dean's Awards from Donald Bren School of Information and Computer Sciences, 2019
  • Merit Student, 2016/2017
  • National Scholarship, 2016
  • Published with GitHub Pages