OmniPSD

OmniPSD: Layered PSD Generation with Diffusion Transformer

Yiren Song¹, Cheng Liu¹, Haofan Wang², Mike Zheng Shou¹

¹Show Lab, National University of Singapore ²Lovart AI

Overview

OmniPSD is a unified diffusion-transformer framework for bidirectional conversion between raster images and editable PSD files with full transparency support. It addresses two tasks:

Text-to-PSD: generates a layered PSD design (background, content layers, text layers) from a text description in a single forward pass, using a 2×2 spatial grid to capture inter-layer relationships.
Image-to-PSD: decomposes a flat poster image into editable layers via an iterative extract-erase pipeline driven by Flux-Kontext.

Key components:

RGBA-VAE — a transparency-preserving VAE that encodes/decodes RGBA images.
Flux-Dev LoRA — fine-tuned for Text-to-PSD generation (4-panel grid layout).
Flux-Kontext LoRA ×4 — separate expert models for foreground/background extraction of content and text layers.

Dataset

A subset of our layered poster dataset is available on Hugging Face:

lc03lc/OmniPSD_Layered_Poster

Setup

git clone https://github.com/showlab/OmniPSD.git
cd OmniPSD
pip install -r requirements.txt

Base models required:

FLUX.1-dev — for Text-to-PSD training and inference
FLUX.1-Kontext-dev — for Image-to-PSD training
RGBA-VAE weights (FLUX.1-dev-alpha) — replace /PATH/TO/RGBA_VAE/ in scripts

Training

All scripts are run from the OmniPSD root directory. Edit the PATH placeholders in each script before running.

Text-to-PSD

bash scripts/train_psd_flux.sh

Uses Flux-Dev with the RGBA-VAE. Training data is a 4-panel grid of [full poster | content layer | background | text-removed poster].

Image-to-PSD — Content Layers

# Foreground extraction
bash scripts/train_psd_content_front.sh

# Background inpainting
bash scripts/train_psd_content_back.sh

Image-to-PSD — Text Layers

# Foreground extraction
bash scripts/train_psd_text_front.sh

# Background inpainting
bash scripts/train_psd_text_back.sh

All Kontext-based scripts take (image, control) pairs where control is the input poster used as conditioning.

Inference

Edit inference/infer_psd_flux.py to set INPUT_TXT_DIR, OUTPUT_ROOT, LORA_PATH, and the RGBA-VAE path, then run:

cd /PATH/TO/OmniPSD
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$(pwd) python inference/infer_psd_flux.py

Each .txt file in INPUT_TXT_DIR is treated as one prompt. The script runs NUM_PASSES times per prompt with incrementing seeds and saves results under OUTPUT_ROOT/<stem>/.

Citation

If you find OmniPSD useful, please cite:

@article{Liu2025OmniPSD,
  title         = {OmniPSD: Layered PSD Generation with Diffusion Transformer},
  author        = {Liu, Cheng and Song, Yiren and Wang, Haofan and Shou, Mike Zheng},
  journal       = {arXiv preprint arXiv:2512.09247},
  year          = {2025},
  archivePrefix = {arXiv},
  eprint        = {2512.09247},
  primaryClass  = {cs.CV},
  doi           = {10.48550/arXiv.2512.09247},
  url           = {https://arxiv.org/abs/2512.09247}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
diffsynth		diffsynth
docs		docs
inference		inference
scripts		scripts
train		train
LICENCE		LICENCE
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OmniPSD

OmniPSD: Layered PSD Generation with Diffusion Transformer

Overview

Dataset

Setup

Training

Text-to-PSD

Image-to-PSD — Content Layers

Image-to-PSD — Text Layers

Inference

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

OmniPSD

OmniPSD: Layered PSD Generation with Diffusion Transformer

Overview

Dataset

Setup

Training

Text-to-PSD

Image-to-PSD — Content Layers

Image-to-PSD — Text Layers

Inference

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages