Official repository for ID-Crafter, a framework for compositional multi-subject video generation from a text prompt and multiple reference images.
We showcase compositional video generation results with multiple subjects, preserving identity consistency and enabling complex interactions.
IDCrafter_website_teaser_compressed.mp4
Visit our project pagehttps://angericky.github.io/ID-Composer/ for more interesting results!
2025-11-01: ID-Crafter was released on arXiv.2025-12-15: The paper was updated toarXiv v4.2026: The arXiv journal reference lists CVPR 2026.2026-03-18: The public repository scaffold was expanded with project documentation, citation metadata, prompt examples, and release notes.
Our framework builds upon a compositional generation paradigm that integrates:
- Identity Encoding from multiple reference images
- Hierarchical Attention Mechanism for identity preservation
- VLM-Guided Semantic Alignment for interaction reasoning
- Online RL Optimization for improved temporal consistency and realism
This design enables scalable and controllable multi-subject video synthesis.
We evaluate ID-Crafter on the open-domain subject-to-video benchmark, comparing with both proprietary and open-source models.
| Method | Total Score ↑ | Motion ↑ | FaceSim ↑ | Natural ↑ |
|---|---|---|---|---|
| VACE-14B | 52.87 | 15.02 | 55.09 | 72.78 |
| Phantom-14B | 52.32 | 33.42 | 51.48 | 68.66 |
| SkyReels-A2-P14B | 49.61 | 25.60 | 45.95 | 67.22 |
| ID-Crafter (1.3B + RL) | 55.16 | 36.50 | 66.10 | 69.15 |
| ID-Crafter (14B) | 57.05 | 40.34 | 60.71 | 73.23 |
ID-Crafter achieves state-of-the-art performance among open-source methods, improving total score by over 4% while significantly enhancing identity preservation (FaceSim), and further benefits from online RL to boost motion alignment and visual quality, with the 14B model delivering the best overall balance across all metrics.
- Paper and project page
- Repository metadata and documentation scaffold
- Machine-readable citation file
- Prompt examples extracted from public demos
- Inference code
- Training code
- Evaluation scripts
- Benchmark/data release instructions
- Checkpoint download instructions
.
├── assets/
│ ├── README.md
│ └── teaser.svg
├── configs/
│ ├── README.md
│ ├── eval.yaml
│ ├── inference.yaml
│ └── train.yaml
├── docs/
│ └── release_status.md
├── examples/
│ └── prompts.md
├── scripts/
│ ├── README.md
│ ├── evaluate.sh
│ ├── run_inference.sh
│ └── train.sh
├── .gitignore
├── CITATION.cff
├── CONTRIBUTING.md
├── LICENSE
├── README.md
└── requirements.txt
- Paper: https://arxiv.org/abs/2511.00511
- Project page: https://angericky.github.io/ID-Composer/
- Repository: https://github.com/paulpanwang/ID-Crafter
We collected a few lightweight prompt summaries from the public project page in examples/prompts.md. They can be used later for demos, regression tests, or README examples once the generation code is released.
Contribution guidance is available in CONTRIBUTING.md. The current public branch is still scaffold-first, so documentation and repository-structure improvements are the safest contributions right now.
If you find ID-Crafter useful in your research, please cite:
@misc{pan2025idcraftervlmgroundedonlinerl,
title={ID-Crafter: VLM-Grounded Online RL for Compositional Multi-Subject Video Generation},
author={Panwang Pan and Jingjing Zhao and Yuchen Lin and Chenguo Lin and Chenxin Li and Hengyu Liu and Tingting Shen and Yadong MU},
year={2025},
eprint={2511.00511},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.00511}
}This repository is distributed under the included ID-Composer Non-Commercial License v1.0.
