Zhenghao Zhao

Bio

My Photo

I am currently a Ph.D. candidate in Computer Science at the University of Illinois Chicago (UIC) guided by Prof. Yan Yan.

My research focuses on data-centric machine learning, with interests in dataset distillation, synthetic data generation, and data selection for LLMs. My work has been published in top-tier conferences such as CVPR, ICCV, ECCV, and NeurIPS, covering topics like dataset distillation for image and multimodal datasets, synthetic dataset generation, and algorithm for data selection.

I interned at Argonne National Laboratory in 2023 and 2025. In 2023, I studied the performance of distributed training frameworks such as PyTorch DDP, Horovod, and DeepSpeed. In 2025, I returned to Argonne to work on LLM training on high-performance computing (HPC) platforms, where I conducted a comparative study of DeepSpeed, TorchTitan, and FSDP for scalable optimization.

Before Ph.D., I received my M.S. in Computer Science from the Illinois Institute of Technology (IIT) and a B.S. in Computer Science and Engineering from Nanjing University of Post and Telecommunications (NJUPT).

Email / CV / Google Scholar / LinkedIn / GitHub

News

2026-02

Our paper, "Consistent Instance Field for Dynamic Scene Understanding" accepted to CVPR 2026!

2025-12

First-author paper, "Distill Video Datasets into Images" is now available on arXiv.

2025-09

First-author paper, "Efficient Multimodal Dataset Distillation via Generative Models" accepted to NeurIPS 2025!

2025-06

Our paper, "CaO₂: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation" accepted to ICCV 2025!

2025-06

Serving as a Local Chair at ICMR 2025!

2025-06

Invited to give Lightning Talk at MMLS 2025!

2025-05

Begin my internship at Argonne National Laboratory!

2025-02

First-author paper, "Distilling Long-tailed Datasets" accepted to CVPR 2025!

2024-10

Our paper, "SSDL:Sensor-to-Skeleton Diffusion Model with Lipschitz Regularization for Human Activity Recognition" accepted to MMM 2025!

2024-07

First-author paper, "Dataset quantization with active learning based adaptive sampling" accepted to ECCV 2024!

2024-06

First-author paper, "Audio-Visual Navigation with Anti-Backtracking" accepted to ICPR 2024!

2024-04

First-author paper, "Monocular Expressive 3D Human Reconstruction of Multiple People" accepted to ICMR 2024 oral!

2024-02

First-author paper, "Gated Multi-Scale Attention Transformer For Few-Shot Medical Image Segmentation" accepted to ISBI 2024!

2023-12

First-author paper, "Supplementing Missing Visions via Dialog for Scene Graph Generations" accepted to ICASSP 2024!

2023-12

First-author paper, "Machine learning enabled multiplex detection of periodontal pathogens by surface-enhanced Raman spectroscopy" accepted to the journal International Journal of Biological Macromolecules!

Work Experience

Amazon · Applied Scientist Intern

May 2026 - Aug. 2026, Santa Cruz, California, United States

Argonne National Laboratory · Research Intern

Research on LLM training on HPC platforms.

May 2025 - Aug. 2025, Lemont, Illinois, United States

Argonne National Laboratory · Research Intern

Research on distributed training frameworks.

May 2023 - Aug. 2023, Lemont, Illinois, United States

Publications

NeurIPS 2025

Distill Video Datasets into Images

Zhenghao Zhao, Haoxuan Wang, Kai Wang, Yuzhang Shang, Yuan Hong, Yan Yan

NeurIPS 2025

Efficient Multimodal Dataset Distillation via Generative Models

Zhenghao Zhao, Haoxuan Wang, Junyi Wu, Yuzhang Shang, Gaowen Liu, Yan Yan

Neural Information Processing Systems (NeurIPS), 2025

CVPR 2025

Distilling Long-tailed Datasets

Zhenghao Zhao*, Haoxuan Wang*, Yuzhang Shang, Kai Wang, Yan Yan

Computer Vision and Pattern Recognition (CVPR), 2025

ECCV 2024

Dataset Quantization with Active Learning based Adaptive Sampling

Zhenghao Zhao, Yuzhang Shang, Junyi Wu, Yan Yan

European Conference on Computer Vision (ECCV), 2024

ICMR 2024

Monocular Expressive 3D Human Reconstruction of Multiple People

Zhenghao Zhao, Hao Tang, Joy Wan, Yan Yan

International Conference on Multimedia Retrieval (ICMR), 2024

ICPR 2024

Audio-Visual Navigation with Anti-Backtracking

Zhenghao Zhao, Hao Tang, Yan Yan

International Conference on Pattern Recognition (ICPR), 2024

ICASSP 2024

Supplementing Missing Visions via Dialog for Scene Graph Generations

Zhenghao Zhao*, Ye Zhu*, Xiaoguang Zhu, Yuzhang Shang, Yan Yan

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

ISBI 2024

Gated Multi-Scale Attention Transformer For Few-Shot Medical Image Segmentation

Zhenghao Zhao*, Hao Ding*, Dawen Cai, Yan Yan

IEEE International Symposium on Biomedical Imaging (ISBI), 2024

IJBM 2023

Machine learning enabled multiplex detection of periodontal pathogens by surface-enhanced Raman spectroscopy

Rathnayake AC Rathnayake*, Zhenghao Zhao*, Nathan McLaughlin, Wei Li, Yan Yan, Liaohai L Chen, Qian Xie, Christine D Wu, Mathew T Mathew, Rong R Wang

International Journal of Biological Macromolecules, 2024