Xubing Ye

Xubing Ye (叶栩冰)

Now, I am a 3rd year master at Shenzhen International Graduate School, Tsinghua University (M.Eng.@THU’2026), supervised by Prof. Yansong Tang. I obtained my bachelor's degree from the School of Software Engineering at Tongji University in 2023. I collaborated with Dr. Yukang Gan, Dr. Yixiao Ge, Dr. Ying Shan and Dr. Zhao Yang.

My current research interest lies at Agentic LLMs and MLLMs.

News

2025-02: A paper on KV Cache compression and video understanding with LLMs accepted by CVPR, 2025.

2025-02: A paper on vision token pruning for MLLMs accepted by CVPR, 2025.

2024-12: Start an internship at Bytedance.

2024-09: A paper on referring image & video segmentation accepted by TPAMI, 2024.

2024-02: Start an internship at Tencent ARC Lab.

Recent Publications

* Indicates Equal Contribution

	VoCo-LLaMA: Towards Vision Compression with Large Language Models Xubing Ye, Yukang Gan, Xiaoke Huang, Yixiao Ge, Yansong Tang IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [arXiv] [PDF] [Project Page] [Code] [AK] [中文解读] Proposed VoCo-LLaMA, an attention-distilled video token compression method enabling video-LLMs to train and inference million-token (1+ hour) videos within a 4k-context LLM.
	ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models Xubing Ye, Yukang Gan, Yixiao Ge, Xiao-ping Zhang, Yansong Tang IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [arXiv] [PDF] [Project Page] Proposed ATP-LLaVA, an efficient MLLM that performs adaptive instance-wise and decoder-layer-wise token pruning with nearly no performance degradation.
	Language-Aware Vision Transformer for Referring Segmentation Xubing Ye, Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip H.S. Torr IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, IF=20.8), 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2022 [IEEE] [PDF] [Code] [Conference Version] Proposed LAVT, a Transformer-based universal referring image and video segmentation (RIS and RVOS) framework that performs language-aware visual encoding in place of cross-modal fusion post feature extraction.

Selected Honors and Awards

Nanhu Elite Scholarship of Tsinghua University, 2025. (清华大学综合优秀奖学金, 校级一等)

Zhaoyi Scholarship of Tsinghua University, 2024. (清华大学综合优秀奖学金, 校级一等)

First Prize Scholarship of Tongji University, 2023. (同济大学综合优秀奖学金, 校级一等)

Second Prize Scholarship of Tongji University, 2021, 2022. (同济大学综合优秀奖学金, 校级二等)

Industrial Experience

	Bytedance Seed Application, Beijing, China. December, 2024 - April, 2025. Project: AI Search with MLLMs. Work with Dr. Baihan Shu.
	Tencent ARC Lab (PCG), Shenzhen, China. February, 2024 - December, 2024. Project: Token Pruning & Compression for MLLMs, Video MLLMs. Work with Dr. Yukang Gan, Dr. Yixiao Ge, Dr. Ying Shan.

Academic Services

Conference Reviewer: CVPR 2025; JVCIR 2024, 2025