Rui Zhao4 Ming-Ming Cheng1,5✉ Qibin Hou1 Chen Change Loy2
DIRECT enables pose-controllable object insertion with explicit geometric guidance from a reconstructed 3D proxy.
For more visual results, please check out our project page.
- [2026.06] Release inference code, interactive demo, and model weights.
- [2026.05] DIRECT was accepted by ICML 2026! The repository and project page are now available.
- Release inference code and interactive demo.
- Release dataset.
- Release training and preprocessing code.
The environment is tested with Python 3.10.18, PyTorch 2.4.0, and CUDA 11.8.
git clone https://github.com/Gong1130/DIRECT.git
cd DIRECT
conda create -n direct python=3.10.18 -y
conda activate directInstall PyTorch for CUDA 11.8:
pip install torch==2.4.0+cu118 torchvision==0.19.0+cu118 --index-url https://download.pytorch.org/whl/cu118Install the remaining dependencies:
pip install --no-build-isolation -r requirements.txtSome dependencies are compiled CUDA extensions. If the build cannot find CUDA, set CUDA_HOME to your local CUDA 11.8 toolkit path before installing the requirements.
Run the demo with:
python demo/demo.py --gradio_port 7860 --viser_port 8081On the first run, the demo will automatically download DIRECT, FLUX.1-Fill-dev, TRELLIS-image-large, SigLIP2, and RMBG-2.0 from Hugging Face. FLUX.1-Fill-dev and RMBG-2.0 are gated models, so please accept their licenses and authenticate with
huggingface-cli loginor by setting yourHF_TOKENbefore running the demo.
Open the Gradio interface at http://localhost:7860. The Viser 3D viewer runs on http://localhost:8081 and is embedded inside the Gradio page.
After launching the demo, an interactive interface will appear as follows.
If you run the demo on a remote server, forward both ports:
ssh -L 7860:localhost:7860 -L 8081:localhost:8081 <user>@<server>After port forwarding, open http://localhost:7860 in your local browser to use the full demo.
If you find DIRECT useful for your research, please consider citing our paper:
@inproceedings{gong2026direct,
title = {Direct 3D-Aware Object Insertion via Decomposed Visual Proxies},
author = {Jingbo Gong and Yikai Wang and Yushi Lan and Yuhao Wan and Ziheng Ouyang and Rui Zhao and Ming-Ming Cheng and Qibin Hou and Chen Change Loy},
booktitle = {ICML},
year = {2026}
}This codebase builds on TRELLIS, FLUX, EasyControl, and the Hugging Face Diffusers ecosystem.
If you have any questions, please feel free to contact us at jingbogong@mail.nankai.edu.cn. We are also actively improving DIRECT, and we welcome any failure cases or feedback encountered during use!



