Skip to content

Gong1130/DIRECT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DIRECT Logo

Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

🔥 ICML 2026

1VCIP, NKU  2S-Lab, NTU  3ZGCA  4SenseTime Research  5NKIARI, Shenzhen Futian

DIRECT enables pose-controllable object insertion with explicit geometric guidance from a reconstructed 3D proxy.

DIRECT teaser

For more visual results, please check out our project page.


📬 News

  • [2026.06] Release inference code, interactive demo, and model weights.
  • [2026.05] DIRECT was accepted by ICML 2026! The repository and project page are now available.

📅 TODO

  • Release inference code and interactive demo.
  • Release dataset.
  • Release training and preprocessing code.

🔍 Overview

DIRECT model overview

🔧 Installation

The environment is tested with Python 3.10.18, PyTorch 2.4.0, and CUDA 11.8.

git clone https://github.com/Gong1130/DIRECT.git
cd DIRECT

conda create -n direct python=3.10.18 -y
conda activate direct

Install PyTorch for CUDA 11.8:

pip install torch==2.4.0+cu118 torchvision==0.19.0+cu118 --index-url https://download.pytorch.org/whl/cu118

Install the remaining dependencies:

pip install --no-build-isolation -r requirements.txt

Some dependencies are compiled CUDA extensions. If the build cannot find CUDA, set CUDA_HOME to your local CUDA 11.8 toolkit path before installing the requirements.

🪄 Interactive Demo

Run the demo with:

python demo/demo.py --gradio_port 7860 --viser_port 8081

On the first run, the demo will automatically download DIRECT, FLUX.1-Fill-dev, TRELLIS-image-large, SigLIP2, and RMBG-2.0 from Hugging Face. FLUX.1-Fill-dev and RMBG-2.0 are gated models, so please accept their licenses and authenticate with huggingface-cli login or by setting your HF_TOKEN before running the demo.

Open the Gradio interface at http://localhost:7860. The Viser 3D viewer runs on http://localhost:8081 and is embedded inside the Gradio page. After launching the demo, an interactive interface will appear as follows.

DIRECT interactive demo

If you run the demo on a remote server, forward both ports:

ssh -L 7860:localhost:7860 -L 8081:localhost:8081 <user>@<server>

After port forwarding, open http://localhost:7860 in your local browser to use the full demo.

📝 BibTeX

If you find DIRECT useful for your research, please consider citing our paper:

@inproceedings{gong2026direct,
  title     = {Direct 3D-Aware Object Insertion via Decomposed Visual Proxies},
  author    = {Jingbo Gong and Yikai Wang and Yushi Lan and Yuhao Wan and Ziheng Ouyang and Rui Zhao and Ming-Ming Cheng and Qibin Hou and Chen Change Loy},
  booktitle = {ICML},
  year      = {2026}
}

👏 Acknowledgements

This codebase builds on TRELLIS, FLUX, EasyControl, and the Hugging Face Diffusers ecosystem.

✉️ Contact

If you have any questions, please feel free to contact us at jingbogong@mail.nankai.edu.cn. We are also actively improving DIRECT, and we welcome any failure cases or feedback encountered during use!

About

[ICML 2026] Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors