GitHub - CVL-UESTC/PerLDiff: ICCV 2025-PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models

PerLDiff：Controllable Street View Synthesis Using Perspective-Layout Diffusion Models (ICCV 2025)

Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu

⭐If this work is helpful for you, please help star this repo. Thanks!🤗

News

📄 [2024.07.08] Paper preprint released!
💾 [2024.12.02] Codebase and model checkpoints are now available.
🏁 [2025.01.16] Training code for the KITTI dataset has been released.
🏆 [2025.06.26] Our paper has been accepted to ICCV 2025!

Setup

Installation

Clone this repo with submodules

git clone https://github.com/LabShuHangGU/PerLDiff.git

The code is tested with Pytorch==1.12.0 and cuda 11.3 on V100 servers. To setup the python environment, follow:

Clone this repository, and we use pytorch1.12.0+cu113 in V100, CUDA 11.3:

conda create -n perldiff python=3.8 -y

conda activate perldiff

pip install albumentations==0.4.3 opencv-python pudb==2019.2 imageio==2.9.0 imageio-ffmpeg==0.4.2 

pip install pytorch-lightning==1.4.2 omegaconf==2.1.1 test-tube>=0.7.5 streamlit>=0.73.1 einops==0.3.0 torch-fidelity==0.3.0 timm

pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113

pip install protobuf torchmetrics==0.6.0 transformers==4.19.2 kornia==0.5.8 ftfy regex tqdm

# git+https://github.com/openai/CLIP.git 
cd ./CLIP
pip install .
cd ../
# pip install git+https://github.com/openai/CLIP.git
pip install nuscenes-devkit tensorboardX efficientnet_pytorch==0.7.0 scikit-image==0.18.0 ipdb gradio

# use "-i https://mirrors.aliyun.com/pypi/simple/" for pip install will be faster

Datasets

We prepare the nuScenes dataset similarly to the instructions in BEVFormer. Specifically, follow these steps:

1. Download the nuScenes Dataset

Download the nuScenes dataset from the official website and place it in the ./DATA/ directory.

You should have the following directory structure:

        DATA/nuscenes
        ├── maps
        ├── samples
        ├── v1.0-test
        └── v1.0-trainval

2. Prepare `samples_road_map`

There are two options to prepare the samples_road_map:

Option 1: Use the provided script (time-consuming, not recommended)

Run the following Python script to download and prepare the road map:
```
python scripts/get_nusc_road_map.py
```

Option 2: Download from Hugging Face (recommended)

Alternatively, you can download the samples_road_map from Hugging Face here.

After downloading the samples_road_map.tar.gz file, extract it using the following command:
```
tar -xzf samples_road_map.tar.gz
```

Finally, you should have these files:

        DATA/nuscenes
        ├── maps
        ├── samples
        ├── samples_road_map
        ├── v1.0-test
        └── v1.0-trainval

Training

Before training, download provided pretrained checkpoint on Hugging Face. Finally, you should have these checkpoints:

PerLDiff/
    openai
    DATA/
    ├── nuscenes
    ├── convnext_tiny_1k_224_ema.pth
    ├── sd-v1-4.ckpt

A training script for reference is provided in bash_run_train.sh.

export TOKENIZERS_PARALLELISM=false
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" 
OMP_NUM_THREADS=16 torchrun \
            --nproc_per_node=8 main.py \
            --training \
            --yaml_file=configs/nusc_text.yaml   \
            --batch_size=2 \
            --name=nusc_train_256x384_perldiff_bs2x8 \
            --guidance_scale_c=5 \
            --step=50 \
            --official_ckpt_name=sd-v1-4.ckpt \
            --total_iters=60000 \
            --save_every_iters=6000 \

Evaluation and Visualize

Before testing, download provided PerLDiff checkpoint on Hugging Face. You should have these checkpoints:

PerLDiff/
    openai
    DATA/
    ├── nuscenes
    ├── convnext_tiny_1k_224_ema.pth
    ├── perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth
    ├── sd-v1-4.ckpt

A testing script for reference is provided in bash_run_test.sh.

export TOKENIZERS_PARALLELISM=false
CUDA_VISIBLE_DEVICES="0,1" OMP_NUM_THREADS=16 torchrun \
            --nproc_per_node=2 main.py \
            --validation \
            --yaml_file=configs/nusc_text.yaml   \
            --batch_size=2 \
            --name=nusc_test_256x384_perldiff_bs2x8 \
            --guidance_scale_c=5 \
            --step=50 \
            --official_ckpt_name=sd-v1-4.ckpt \
            --total_iters=60000 \
            --save_every_iters=6000 \
            --val_ckpt_name=DATA/perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth \

If you want to use Hugging Face Gradio, you can run the script:

bash bash_run_gradio.sh

Test FID

Before testing FID, you should generate the validation dataset using bash_run_gen.sh.

export TOKENIZERS_PARALLELISM=false
CUDA_VISIBLE_DEVICES="0,1" OMP_NUM_THREADS=16 torchrun \
            --nproc_per_node=2 main.py \
            --generation \
            --yaml_file=configs/nusc_text_with_path.yaml   \
            --batch_size=4 \
            --name=nusc_test_256x384_perldiff_bs2x8 \
            --guidance_scale_c=5 \
            --step=50 \
            --official_ckpt_name=sd-v1-4.ckpt \
            --total_iters=60000 \
            --save_every_iters=6000 \
            --val_ckpt_name=DATA/perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth \
            --gen_path=val_ddim50w5_256x384_perldiff_bs2x8 \

We provide two methods for measuring FID:

Option 1: Using clean_fid

The FID calculated by this method tends to be higher. First, you need to process the NuScenes real validation dataset and save it as 256x384 images:
```
python scripts/get_nusc_real_img.py
```
Then, calculate the FID:
```
pip install clean-fid
python FID/cleanfid_test_fid.py val_ddim50w5_256x384_perldiff_bs2x8/samples samples_real_256x384/samples
```

Option 2: Using the method provided by MagicDrive

This method requires modifications to the MagicDrive code:
- Copy the generated data val_ddim50w5_256x384_perldiff_bs2x8/ to MagicDrive/data/nuscenes
- Copy FID/configs_256x384 to the working directory MagicDrive/configs_256x384
- Copy FID/fid_score_384.py to MagicDrive/tools/fid_score_384.py
Then, run FID/fid_test.sh

Citation

@inproceedings{zhang2025perldiff,
  title={PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model},
  author={Zhang, Jinhua and Sheng, Hualian and Cai, Sijia and Deng, Bing and Liang, Qiao and Li, Wen and Fu, Ying and Ye, Jieping and Gu, Shuhang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={26306--26315},
  year={2025}
}

Related Repositories

https://github.com/gligen/GLIGEN/

https://github.com/fundamentalvision/BEVFormer

https://github.com/cure-lab/MagicDrive/

https://github.com/mit-han-lab/bevfusion

https://github.com/bradyz/cross_view_transformers

Contact

If you have any questions, feel free to contact me through email ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
CLIP		CLIP
ConvNeXt		ConvNeXt
FID		FID
assert/img		assert/img
configs		configs
dataset		dataset
ldm		ldm
nuscenes_sample_info_files		nuscenes_sample_info_files
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SD_input_conv_weight_bias.pth		SD_input_conv_weight_bias.pth
bash_run_gen.sh		bash_run_gen.sh
bash_run_gradio.sh		bash_run_gradio.sh
bash_run_test.sh		bash_run_test.sh
bash_run_train.sh		bash_run_train.sh
create_conda_env.sh		create_conda_env.sh
gener.py		gener.py
main.py		main.py
main_gradio.py		main_gradio.py
trainer.py		trainer.py
valer.py		valer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PerLDiff：Controllable Street View Synthesis Using Perspective-Layout Diffusion Models (ICCV 2025)

News

Setup

Installation

Datasets

1. Download the nuScenes Dataset

2. Prepare `samples_road_map`

Training

Evaluation and Visualize

Test FID

Option 1: Using clean_fid

Option 2: Using the method provided by MagicDrive

Citation

Related Repositories

Contact

About

Uh oh!

Packages

Uh oh!

Languages

License

CVL-UESTC/PerLDiff

Folders and files

Latest commit

History

Repository files navigation

PerLDiff：Controllable Street View Synthesis Using Perspective-Layout Diffusion Models (ICCV 2025)

News

Setup

Installation

Datasets

1. Download the nuScenes Dataset

2. Prepare samples_road_map

Training

Evaluation and Visualize

Test FID

Option 1: Using clean_fid

Option 2: Using the method provided by MagicDrive

Citation

Related Repositories

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Languages

2. Prepare `samples_road_map`

Packages