Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu
βIf this work is helpful for you, please help star this repo. Thanks!π€
- π [2024.07.08] Paper preprint released!
- πΎ [2024.12.02] Codebase and model checkpoints are now available.
- π [2025.01.16] Training code for the KITTI dataset has been released.
- π [2025.06.26] Our paper has been accepted to ICCV 2025!
Clone this repo with submodules
git clone https://github.com/LabShuHangGU/PerLDiff.gitThe code is tested with Pytorch==1.12.0 and cuda 11.3 on V100 servers. To setup the python environment, follow:
Clone this repository, and we use pytorch1.12.0+cu113 in V100, CUDA 11.3:
conda create -n perldiff python=3.8 -y
conda activate perldiff
pip install albumentations==0.4.3 opencv-python pudb==2019.2 imageio==2.9.0 imageio-ffmpeg==0.4.2
pip install pytorch-lightning==1.4.2 omegaconf==2.1.1 test-tube>=0.7.5 streamlit>=0.73.1 einops==0.3.0 torch-fidelity==0.3.0 timm
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install protobuf torchmetrics==0.6.0 transformers==4.19.2 kornia==0.5.8 ftfy regex tqdm
# git+https://github.com/openai/CLIP.git
cd ./CLIP
pip install .
cd ../
# pip install git+https://github.com/openai/CLIP.git
pip install nuscenes-devkit tensorboardX efficientnet_pytorch==0.7.0 scikit-image==0.18.0 ipdb gradio
# use "-i https://mirrors.aliyun.com/pypi/simple/" for pip install will be fasterWe prepare the nuScenes dataset similarly to the instructions in BEVFormer. Specifically, follow these steps:
-
Download the nuScenes dataset from the official website and place it in the
./DATA/directory.You should have the following directory structure:
DATA/nuscenes
βββ maps
βββ samples
βββ v1.0-test
βββ v1.0-trainvalThere are two options to prepare the samples_road_map:
Option 1: Use the provided script (time-consuming, not recommended)
-
Run the following Python script to download and prepare the road map:
python scripts/get_nusc_road_map.py
Option 2: Download from Hugging Face (recommended)
-
Alternatively, you can download the
samples_road_mapfrom Hugging Face here.After downloading the
samples_road_map.tar.gzfile, extract it using the following command:tar -xzf samples_road_map.tar.gz
Finally, you should have these files:
DATA/nuscenes
βββ maps
βββ samples
βββ samples_road_map
βββ v1.0-test
βββ v1.0-trainvalBefore training, download provided pretrained checkpoint on Hugging Face. Finally, you should have these checkpoints:
PerLDiff/
openai
DATA/
βββ nuscenes
βββ convnext_tiny_1k_224_ema.pth
βββ sd-v1-4.ckptA training script for reference is provided in bash_run_train.sh.
export TOKENIZERS_PARALLELISM=false
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
OMP_NUM_THREADS=16 torchrun \
--nproc_per_node=8 main.py \
--training \
--yaml_file=configs/nusc_text.yaml \
--batch_size=2 \
--name=nusc_train_256x384_perldiff_bs2x8 \
--guidance_scale_c=5 \
--step=50 \
--official_ckpt_name=sd-v1-4.ckpt \
--total_iters=60000 \
--save_every_iters=6000 \Before testing, download provided PerLDiff checkpoint on Hugging Face. You should have these checkpoints:
PerLDiff/
openai
DATA/
βββ nuscenes
βββ convnext_tiny_1k_224_ema.pth
βββ perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth
βββ sd-v1-4.ckptA testing script for reference is provided in bash_run_test.sh.
export TOKENIZERS_PARALLELISM=false
CUDA_VISIBLE_DEVICES="0,1" OMP_NUM_THREADS=16 torchrun \
--nproc_per_node=2 main.py \
--validation \
--yaml_file=configs/nusc_text.yaml \
--batch_size=2 \
--name=nusc_test_256x384_perldiff_bs2x8 \
--guidance_scale_c=5 \
--step=50 \
--official_ckpt_name=sd-v1-4.ckpt \
--total_iters=60000 \
--save_every_iters=6000 \
--val_ckpt_name=DATA/perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth \If you want to use Hugging Face Gradio, you can run the script:
bash bash_run_gradio.shBefore testing FID, you should generate the validation dataset using bash_run_gen.sh.
export TOKENIZERS_PARALLELISM=false
CUDA_VISIBLE_DEVICES="0,1" OMP_NUM_THREADS=16 torchrun \
--nproc_per_node=2 main.py \
--generation \
--yaml_file=configs/nusc_text_with_path.yaml \
--batch_size=4 \
--name=nusc_test_256x384_perldiff_bs2x8 \
--guidance_scale_c=5 \
--step=50 \
--official_ckpt_name=sd-v1-4.ckpt \
--total_iters=60000 \
--save_every_iters=6000 \
--val_ckpt_name=DATA/perldiff_256x384_lambda_5_bs2x8_model_checkpoint_00060000.pth \
--gen_path=val_ddim50w5_256x384_perldiff_bs2x8 \We provide two methods for measuring FID:
Option 1: Using clean_fid
-
The FID calculated by this method tends to be higher. First, you need to process the NuScenes real validation dataset and save it as 256x384 images:
python scripts/get_nusc_real_img.py
Then, calculate the FID:
pip install clean-fid python FID/cleanfid_test_fid.py val_ddim50w5_256x384_perldiff_bs2x8/samples samples_real_256x384/samples
Option 2: Using the method provided by MagicDrive
-
This method requires modifications to the MagicDrive code:
- Copy the generated data
val_ddim50w5_256x384_perldiff_bs2x8/toMagicDrive/data/nuscenes - Copy
FID/configs_256x384to the working directoryMagicDrive/configs_256x384 - Copy
FID/fid_score_384.pytoMagicDrive/tools/fid_score_384.py
- Copy the generated data
-
Then, run
FID/fid_test.sh
@inproceedings{zhang2025perldiff,
title={PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model},
author={Zhang, Jinhua and Sheng, Hualian and Cai, Sijia and Deng, Bing and Liang, Qiao and Li, Wen and Fu, Ying and Ye, Jieping and Gu, Shuhang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={26306--26315},
year={2025}
}
https://github.com/gligen/GLIGEN/
https://github.com/fundamentalvision/BEVFormer
https://github.com/cure-lab/MagicDrive/
https://github.com/mit-han-lab/bevfusion
https://github.com/bradyz/cross_view_transformers
If you have any questions, feel free to contact me through email ([email protected]).

