Mingkai Jia1,2, Mingxiao Li2, Zhijian Shu2,3, Anlin Zheng4, Liaoyuan Fan2, Jiaxin Guo5, Tianxing Shi3, Dongyue Lu2, Zeming Li1, Xiaoyang Guo2, Xiaojuan Qi4, Xiao-Xiao Long3, Qian Zhang2, Ping Tan1*, Wei Yin2*§,
HKUST1, Horizon Robotics2, NJU3, HKU4, CUHK5,
* Corresponding Author, § Project Leader

[April 2026]Released Inference Code[April 2026]Released models & stats.[Nov 2025]Released paper.
- Training code.
- Models & Evaluation code.
- Huggingface models & stats.
git clone https://github.com/MKJia/DINO-Tok.git
cd DINO-Tokconda create -n dinotok python=3.10
conda activate dinotok
pip3 install -r requirements.txtDownload the pretrained models & stats from our model & stat to your /path/to/your/ckpt.
We default use the ImageNet-1k dataset. Or you can try our UHDBench dataset on huggingface and download to your /path/to/your/dataset.
Remember to change the paths in scripts.
bash scripts/test_aetok.bash
bash scripts/test_aegen.bash
bash scripts/test_vqtok.bash
bash scripts/test_vqgen.bash- 🔥 Qualitative reconstruction images.
- 🔥 Qualitative class-to-image generation of Imagenet.
- 🔥 Evaluation of dino-tok-ae on 256×256 ImageNet benchmark.
- 🔥 Evaluation of dino-tok-vq on 256×256 ImageNet benchmark.
If the paper and code from DINO-Tok help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.
@article{jia2025dinotok,
title={DINO-Tok: Adapting DINO for Visual Tokenizers},
author={Jia, Mingkai and Li, Mingxiao and Fan, Liaoyuan and Shi, Tianxing and Guo, Jiaxin and Li, Zeming and Guo, Xiaoyang and Long, Xiao-Xiao and Zhang, Qian and Tan, Ping and others},
journal={arXiv preprint arXiv:2511.20565},
year={2025}
}This repository is under the MIT License. For more license questions, please contact Mingkai Jia (mjiaab@connect.ust.hk) and Wei Yin (yvanwy@outlook.com).



