DiMR: Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models (NeurIPS 2024)

We propose DiMR, a new diffusion backbone that achieves state-of-the-art image generation. For example, on the ImageNet 256 x 256 benchmark, DiMR, with only 505M parameters, surpasses all existing image generation models of various sizes, without any bells and whistles.

In addition, with the proposed Multi-Resolution Network, DiMR alleviates distortions and enhances visual fidelity without increasing computational costs.

Requirements

The code has been tested with PyTorch 2.1.2 and Cuda 12.1.

An example of installation commands is provided as follows:

git clone https://github.com/qihao067/DiMR.git
cd DiMR

## environment setup
pip3 install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
pip3 install accelerate==0.12.0 absl-py ml_collections einops wandb ftfy==6.1.1 transformers==4.23.1

pip3 install -U --pre triton
pip3 install timm
pip3 install tensorboard

Pretrained Models

Model	Image Resolutions	Training Epochs	#Params.	Gflops	FID50K	Download
DiMR-XL/2R	256x256	800	505M	160	1.70	[Link]
DiMR-G/2R	256x256	800	1.06B	331	1.63	[Link]

Please note that these models are trained only on limited academic dataset ImageNet, and they are only for research purposes.

Data Preparation

The data preparation protocol strictly follows U-ViT. Many thanks to the authors for their outstanding efforts.

Step 1: Download the auto-encoder from Stable Diffusion:

Download the stable-diffusion directory from this link (which contains image autoencoders converted from Stable Diffusion). Place the downloaded directory in this codebase as assets/stable-diffusion. The autoencoders are used in latent diffusion models.

Step 2: Prepare ImageNet:

Download the original ImageNet dataset and extract its features using scripts/extract_imagenet_feature.py. Make sure you also update the path of the extracted features in the config file (configs/DiMR-G-2R_imagenet256.py or configs/DiMR-XL-2R_imagenet256.py).

Step 3: Prepare reference statistics for FID

Download the fid_stats directory from this link (which contains reference statistics for FID). Place the downloaded directory in this codebase as assets/fid_stats. These reference statistics are used to monitor FID during the training process, in addition to evaluation.

Training DiMR

We provide a training script for training class-conditional DiMR models on ImageNet 256 x 256 from scratch. It can be easily modified to support different resolutions and datasets. To reproduce the results, launch the training with 16 GPUs on 2 nodes using the following commands:

accelerate launch --multi_gpu --num_processes 16 --num_machines 2 --mixed_precision fp16 train.py \
       --config=configs/DiMR-XL-2R_imagenet256.py

Evaluation

Following previous methods, we use ADM's TensorFlow evaluation suite to compute FID, Inception Score, and other metrics. To do so, you first need to sample 50K images from our pre-trained DiMR model using N GPUs:

rm -rf saved_images* class_lab.txt

accelerate launch --multi_gpu --num_processes N --mixed_precision fp16 eval.py \
       --config=configs/DiMR-XL-2R_imagenet256.py \
       --nnet_path='path/to/the/checkpoint' \
       --IMGsave_path=saved_images

The generated images will be saves in saved_images, and the class labels will be saved in class_lab.txt.

Then, run the following script to convert the generated images into a .npz file:

python3 img2npz.py

After that, please follow ADM's TensorFlow evaluation suite to compute all metrics.

In addition, following U-ViT, we will also report an FID score computed by a PyTorch implementation when sampling 50K images using eval.py. However, this is only used to help monitor training. For a fair comparison with DiT, we report the results computed by ADM's TensorFlow evaluation suite.

Terms of use

The project is created for research purposes.

Acknowledgements

This codebase is built upon the following repository:

[U-ViT]

Much appreciation for their outstanding efforts.

License

The code in this repository is released under the Apache License, Version 2.0.

BibTeX

If you use our work in your research, please use the following BibTeX entry.

@article{liu2024alleviating,
  title={Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models},
  author={Liu, Qihao and Zeng, Zhanpeng and He, Ju and Yu, Qihang and Shen, Xiaohui and Chen, Liang-Chieh},
  journal={arXiv preprint arXiv:2406.09416},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
imgs		imgs
libs		libs
scripts		scripts
tools		tools
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
dpm_solver.py		dpm_solver.py
eval.py		eval.py
sde.py		sde.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DiMR: Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models (NeurIPS 2024)

Requirements

Pretrained Models

Data Preparation

Training DiMR

Evaluation

Terms of use

Acknowledgements

License

BibTeX

About

Uh oh!

Releases

Packages

Languages

License

qihao067/DiMR

Folders and files

Latest commit

History

Repository files navigation

DiMR: Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models (NeurIPS 2024)

Requirements

Pretrained Models

Data Preparation

Training DiMR

Evaluation

Terms of use

Acknowledgements

License

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages