Fudan University
ICML 2026
✉ Corresponding Author
- [2026/05/21] Release OcclusionFormer in ArXiv!.
- [2026/05/18] Release inference code, model weights and SA-Z dataset.
- [2026/05/18] Release OcclusionFormer open-source package in this repository.
- [2026/4/30] OcclusionFormer is accepted to ICML 2026.
OcclusionFormer addresses a core challenge in layout-to-image generation: when multiple bounding boxes overlap, standard methods often produce entangled textures and incorrect front/back ordering.
From the paper, OcclusionFormer introduces explicit Z-order modeling for layout-grounded generation by:
- decoupling instance generation,
- arranging occlusion order with a volume-rendering-inspired transmittance mechanism,
- and enforcing spatial precision with a queried alignment objective.
The paper also introduces SA-Z, a large-scale dataset with explicit occlusion order and amodal supervision for occlusion-aware layout generation.
- SA-Z Dataset Curation: Enriches layout annotations with instance captions, explicit occlusion order, and amodal signals.

- Occlusion-Aware DiT Framework: Models Z-order dependencies explicitly rather than mixing overlapping instances implicitly.
- Instance Decoupling + Volumetric Composition: Improves robustness on dense overlap scenes by composing instances with transmittance-based ordering.
- Queried Alignment Mechanism: Improves spatial faithfulness and local semantic consistency.

- Environment setup
cd OcclusionFormer
conda create -n OcclusionFormer python=3.11 -y
conda activate OcclusionFormer- Install requirements
pip install --upgrade -r requirements.txt- Download checkpoint
https://huggingface.co/FudanCVL/OcclusionFormer/main/occlusionformer to ./ckpt- Run Streamlit demo (Recommended)
streamlit run demo_occlusionformer.py- Run CLI inference
python inference_occlusionformer.py \
--model_path /path/to/FLUX.1-dev \
--ckpt_path /path/to/occlusionformer_checkpoint_dir \
--layout_json ./examples/livingroom.json \
--output_dir ./outputs_occlusionformer \
--enable_layout \
--overwriteBatch inference with a directory of JSON layouts:
python inference_occlusionformer.py \
--model_path /path/to/FLUX.1-dev \
--ckpt_path /path/to/occlusionformer_checkpoint_dir \
--layout_dir ./examples \
--output_dir ./outputs_occlusionformer \
--enable_layout \
--overwrite- Organize and update the Amodal annotation on Hugging Face.
This folder provides a standalone inference/demo package:
demo_occlusionformer.py: Streamlit demo UIinference_occlusionformer.py: CLI inferencesrc/occlusionformer/: OcclusionFormer core modulessrc/utils.py,src/transformer_utils.py: required utility modulesexamples/: example layout JSON filesrequirements.txt: runtime dependencies
- The demo and CLI follow the current project preprocessing logic and compose prompts using global prompt + instance captions.
- Layout control is enabled via
--enable_layout(or disabled with--disable_layout). - Outputs include generated images and layout overlays for visualization.
This work is built on many amazing research works and open-source projects. We thank the authors for sharing!
@inproceedings{li2026occlusionformer,
title={OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation},
author={Li, Ziye and Ding, Henghui},
booktitle={ICML},
year={2026}
}