DiMR: Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models (NeurIPS 2024)
[project page] | [paper] | [arxiv]
We propose DiMR, a new diffusion backbone that achieves state-of-the-art image generation. For example, on the ImageNet 256 x 256 benchmark, DiMR, with only 505M parameters, surpasses all existing image generation models of various sizes, without any bells and whistles.
In addition, with the proposed Multi-Resolution Network, DiMR alleviates distortions and enhances visual fidelity without increasing computational costs.
The code has been tested with PyTorch 2.1.2 and Cuda 12.1.
An example of installation commands is provided as follows:
git clone https://github.com/qihao067/DiMR.git
cd DiMR
## environment setup
pip3 install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
pip3 install accelerate==0.12.0 absl-py ml_collections einops wandb ftfy==6.1.1 transformers==4.23.1
pip3 install -U --pre triton
pip3 install timm
pip3 install tensorboard
| Model | Image Resolutions | Training Epochs | #Params. | Gflops | FID50K | Download |
|---|---|---|---|---|---|---|
| DiMR-XL/2R | 256x256 | 800 | 505M | 160 | 1.70 | [Link] |
| DiMR-G/2R | 256x256 | 800 | 1.06B | 331 | 1.63 | [Link] |
Please note that these models are trained only on limited academic dataset ImageNet, and they are only for research purposes.
The data preparation protocol strictly follows U-ViT. Many thanks to the authors for their outstanding efforts.
Step 1: Download the auto-encoder from Stable Diffusion:
Download the stable-diffusion directory from this link (which contains image autoencoders converted from Stable Diffusion). Place the downloaded directory in this codebase as assets/stable-diffusion. The autoencoders are used in latent diffusion models.
Step 2: Prepare ImageNet:
Download the original ImageNet dataset and extract its features using scripts/extract_imagenet_feature.py. Make sure you also update the path of the extracted features in the config file (configs/DiMR-G-2R_imagenet256.py or configs/DiMR-XL-2R_imagenet256.py).
Step 3: Prepare reference statistics for FID
Download the fid_stats directory from this link (which contains reference statistics for FID). Place the downloaded directory in this codebase as assets/fid_stats. These reference statistics are used to monitor FID during the training process, in addition to evaluation.
We provide a training script for training class-conditional DiMR models on ImageNet 256 x 256 from scratch. It can be easily modified to support different resolutions and datasets. To reproduce the results, launch the training with 16 GPUs on 2 nodes using the following commands:
accelerate launch --multi_gpu --num_processes 16 --num_machines 2 --mixed_precision fp16 train.py \
--config=configs/DiMR-XL-2R_imagenet256.py
Following previous methods, we use ADM's TensorFlow evaluation suite to compute FID, Inception Score, and other metrics. To do so, you first need to sample 50K images from our pre-trained DiMR model using N GPUs:
rm -rf saved_images* class_lab.txt
accelerate launch --multi_gpu --num_processes N --mixed_precision fp16 eval.py \
--config=configs/DiMR-XL-2R_imagenet256.py \
--nnet_path='path/to/the/checkpoint' \
--IMGsave_path=saved_images
The generated images will be saves in saved_images, and the class labels will be saved in class_lab.txt.
Then, run the following script to convert the generated images into a .npz file:
python3 img2npz.py
After that, please follow ADM's TensorFlow evaluation suite to compute all metrics.
In addition, following U-ViT, we will also report an FID score computed by a PyTorch implementation when sampling 50K images using eval.py. However, this is only used to help monitor training. For a fair comparison with DiT, we report the results computed by ADM's TensorFlow evaluation suite.
The project is created for research purposes.
This codebase is built upon the following repository:
- [U-ViT]
Much appreciation for their outstanding efforts.
The code in this repository is released under the Apache License, Version 2.0.
If you use our work in your research, please use the following BibTeX entry.
@article{liu2024alleviating,
title={Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models},
author={Liu, Qihao and Zeng, Zhanpeng and He, Ju and Yu, Qihang and Shen, Xiaohui and Chen, Liang-Chieh},
journal={arXiv preprint arXiv:2406.09416},
year={2024}
}


