Sheng Jin's Homepage

About Me

Sheng Jin is currently a research scientist at SenseTime Research. His research focus is on teaching agents/robots to see and understand human behaviors such as human body poses, actions, and human-machine interactions.

In 2024, he received the PhD degree in the Department of Computer Science at the University of Hong Kong, advised by Prof. Ping Luo and co-supervised by Prof. Wenping Wang and Prof. Xiaoou Tang. In 2020, he received his master's degree in the Department of Automation at Tsinghua University, advised by Prof. Changshui Zhang. In 2017, he received the B.Eng. degree with highest honor (Outstanding Graduate Scholarships) from Tsinghua University.

News

[2025-04] Three papers accepted to CVPR'2025 (1 Highlight and 2 Posters).

[2024-12] Two papers accepted to AAAI'2025.

[2024-09] One papers accepted to NeurIPS'2024.

[2024-07] Four papers accepted to ECCV'2024.

[2024-01] Two papers accepted to ICLR'2024 (1 Spotlight and 1 Poster).

[2023-12] One paper accepted to AAAI'2024.

[2023-03] One paper accepted to CVPR'2023.

[2022-08] One paper accepted to TPAMI'2022.

[2022-07] Three papers accepted to ECCV'2022 (1 Oral and 2 Posters).

[2022-04] One paper accepted to CVPR'2022 (Oral).

[2022-01] One paper accepted to ICLR'2022.

Education

	The University of Hong Kong PhD in Computer Science (HKPFS awardee), 2020~2024
	Tsinghua University MS in Control Science and Engineering, 2017~2020
	Tsinghua University BSc in Automation (ranking 1/145), 2013~2017

Honors and Awards

YS and Christabel Lung Postgraduate Scholarship, 2020-2021.

HKU Presidential PhD Scholarship (HKU-PS) 2020-2024.

Hong Kong PhD Fellowships (HKPF), 2020-2024.

Outstanding Graduate of Beijing City, 2017.

Outstanding Graduate of Tsinghua University (top 1% in Tsinghua), 2017.

The Baosteel Excellent Student Scholarship, 2016.

Zheng Weimin Scholarship (2nd class) for Comprehensive Excellence, 2016.

Tsinghua-JJWorld (Beijing) Nework Technology Fellowships, Tsinghua University, 2015.

Tsinghua-Evergrande Fellowships for Academic Excellence, Tsinghua University, 2014.

Selected Publications

* means equal contributions.

	F-LMM: Grounding Frozen Large Multimodal Models Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [Paper] [Code]
	NADER: Neural Architecture Design via Multi-Agent Collaboration Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [Paper]
	KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension Jie Yang, Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Ruimao Zhang Conference on Neural Information Processing Systems (NeurIPS), 2024. [Paper]
	AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu AAAI Conference on Artificial Intelligence (AAAI), 2025. [Paper] [Code]
	UniFS: Universal Few-shot Instance Perception with Point Representations Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo European Conference on Computer Vision (ECCV), 2024. [Paper] [Code & Data]
	You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception Sheng Jin, Shuhuai Li, Tong Li, Wentao Liu, Chen Qian, Ping Luo European Conference on Computer Vision (ECCV), 2024. [Paper] [Code & Data]
	Pose for Everything: Towards Category-Agnostic Pose Estimation Lumin Xu, Sheng Jin*, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang European Conference on Computer Vision (ECCV), 2022, Oral.* [Paper] [Code & Data] [Blog(商汤学术)] [Talk（OpenMMLab社区）]
	Whole-Body Human Pose Estimation in the Wild Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo European Conference on Computer Vision (ECCV), 2020. [Paper] [Dataset]
	Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, and Ping Luo European Conference on Computer Vision (ECCV), 2020. [Paper] [Blog(知乎)]
	Multi-person Articulated Tracking with Spatial and Temporal Embeddings Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [Paper] [Demo]

Other papers

Click to expand or collapse

	Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling Haopeng Sun, Yingwei Zhang, Lumin Xu, Sheng Jin, Ping Luo, Chen Qian, Wentao Liu, Yiqiang Chen Conference on Computer Vision and Pattern Recognition (CVPR), 2025, Highlight. [Paper]
	Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging Transformer Haopeng Sun, Yingwei Zhang, Lumin Xu, Sheng Jin, Yiqiang Chen AAAI Conference on Artificial Intelligence (AAAI), 2025. [Paper]
	When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu European Conference on Computer Vision (ECCV), 2024. [Paper] [Code & Data]
	GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian, Ping Luo, Ji Wu European Conference on Computer Vision (ECCV), 2024. [Paper] [Code]
	TCFormer: Visual Recognition via Token Clustering Transformer Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024. [Paper] [Code]
	CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy International Conference on Learning Representations (ICLR), 2024, Spotlight. [Paper] [Code] [Blog(商汤学术)]
	PROGRAM: PROtotype GRAph Model based Pseudo-Label Learning for Test-Time Adaptation Haopeng Sun, Lumin Xu, Sheng Jin, Ping Luo, Chen Qian, Wentao Liu International Conference on Learning Representations (ICLR), 2024.
	CLIM: Contrastive Language-Image Mosaic for Region Representation Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy AAAI Conference on Artificial Intelligence (AAAI), 2024. [Paper] [Code] [Blog(商汤学术)]
	Aligning Bag of Regions for Open-Vocabulary Object Detection Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy Conference on Computer Vision and Pattern Recognition (CVPR), 2023. [Paper] [Code] [Project] [Blog(商汤学术)]
	ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022. [Paper] [Data] [Talk（OpenMMLab社区）]
	PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation Wentao Jiang, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Si Liu European Conference on Computer Vision (ECCV), 2022. [Paper] [Code] [Blog(商汤学术)] [Talk（OpenMMLab社区）]
	3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal Hao Meng, Sheng Jin*, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, Ping Luo European Conference on Computer Vision (ECCV), 2022.* [Paper] [Code & Data] [Project] [Blog(商汤学术)] [Talk（OpenMMLab社区）]
	Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Ouyang Wanli, Xiaogang Wang Conference on Computer Vision and Pattern Recognition (CVPR), 2022, Oral. [Paper] [Code] [Blog(商汤学术)] [News(机器之心)] [Talk（OpenMMLab社区）]
	Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization Can Wang, Sheng Jin, Yingda Guan, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang International Conference on Learning Representations (ICLR), 2022. [Paper]
	Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images Size Wu, Sheng Jin, Wentao Liu, Lei Bai, Chen Qian, Dong Liu, Wanli Ouyang IEEE International Conference on Computer Vision (ICCV), 2021. [Paper] [Code] [Blog(商汤学术)] [Demo]
	ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [Paper] [Code] [Talk（OpenMMLab社区）]
	When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks Jiahang Wang, Sheng Jin, Wentao Liu, Weizhong Liu, Chen Qian, Ping Luo Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [Paper] [Code]
	When Counterpoint Meets Chinese Folk Melodies Nan Jiang, Sheng Jin, Zhiyao Duan, Changshui Zhang Conference on Neural Information Processing Systems (NeurIPS), 2020. [Paper] [Supplementary] [Poster] [Code] [Project Page]
	TRB: A Novel Triplet Representation for Understanding 2D Human Body Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang IEEE International Conference on Computer Vision (ICCV), 2019, Oral. [Paper] [Dataset]
	Robust Few-Shot Learning for User-Provided Data Jiang Lu, Sheng Jin, Jian Liang, and Changshui Zhang IEEE Transactions on Neural Networks and Learning Systems (TNNLS). [Paper]
	RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning Nan Jiang, Sheng Jin, Zhiyao Duan, Changshui Zhang AAAI Conference on Artificial Intelligence (AAAI), 2020, Oral. [Paper] [Demo]
	Hierarchical Automatic Curriculum Learning: Converting a Sparse Reward Navigation Task into Dense Reward Nan Jiang, Sheng Jin, Changshui Zhang Neurocomputing, 2019. [Paper]
	Connectionist Temporal Classification with Maximum Entropy Regularization Hu Liu, Sheng Jin, Changshui Zhang Conference on Neural Information Processing Systems (NeurIPS), 2018, Spotlight. [Paper] [Poster] [Code]
	Towards Multi-Person Pose Tracking: Bottom-up and Top-down Methods Sheng Jin, Xujie Ma, Zhipeng Han, Yue Wu, Wei Yang, Wentao Liu, Chen Qian, Wanli Ouyang International Conference on Computer Vision (ICCV) PoseTrack Workshop, 2017. [Paper] [Leaderboard](BUTDS and BUTD2) [Demo]

Projects

	MMPose Toolbox MMPose is an open-source toolbox for pose estimation based on PyTorch, which is a part of the OpenMMLab project. [Project]
	ACM MM'2020 HiEve Challenge Our team (SimpleTrack) won the 3rd place in Track-3 "Crowd Pose Tracking in Complex Events" of ACM MM'2020 HiEve Challenge. [Leaderboard] [Technical Report]
	CVPR'2018 Look Into Person (LIP) Challenge Our team (MJDG) won the 2nd place in Track-4 "Multi-Human Pose Estimation Challenge" of CVPR'2018 LIP Challenge. [Leaderboard] [Oral Presentation]
	ICCV'2017 PoseTrack Challenge Our team (BUTDS \| BUTD2) won the 2nd places in both Track-1 "Single-Frame Person Pose Estimation" and Track-3 "Multi-Person Pose Tracking" of ICCV'2017 PoseTrack Challenge. [Leaderboard] [Technical Report] [Oral Presentation] [Demo]

Patents

Click to expand or collapse

Key point detection method, device, electronic equipment and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN111898642A. Publication Date: 2020-11-06.

Key point detection method, device, electronic equipment and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN111783882A. Publication Date: 2020-10-16.

Image processing method and device, detection device and storage medium

Tong Li, Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN111539992A. Publication Date: 2020-08-14.

Key point detection method, device, electronic equipment and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN111444928A. Publication Date: 2020-07-24.

Image processing method and device, detection device and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN109948526A. Publication Date: 2019-06-28.

Image processing method and device, detection device and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN109934183A. Publication Date: 2019-06-25.

Deep learning model training method and device, training equipment and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN109919245A. Publication Date: 2019-06-21.

Teaching

TA, Deep Learning (COMP7606), HKU, [autumn, 2021]

TA, From Human Vision to Machine Vision (CCST9049), HKU, [spring, 2020]

TA, Introduction to Artificial Intelligence (40250182-0), THU, [spring, 2019]

Activities

Conference Reviewer/PC Member

NeurIPS'19-24, AAAI'19-24, ICML'20-25, CVPR'20-25, ICCV'21-25, ECCV'22-24, ICLR'21-25, WACV'21-24,

Journal Reviewer

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), IEEE Transactions on Artificial Intelligence (TAI), Transactions on Image Processing (TIP), International Journal of Computer Vision (IJCV), IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Visualization and Computer Graphics (TVCG)

Website Chairs

IEEE SPS SIDAS 2017 Forum

Contacts

js20 [at] connect.hku.hk | jinsheng13 [at] foxmail.com

Sheng Jin

About Me

News

Education

The University of Hong Kong

Tsinghua University

Tsinghua University

Honors and Awards

Selected Publications

F-LMM: Grounding Frozen Large Multimodal Models

NADER: Neural Architecture Design via Multi-Agent Collaboration

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

UniFS: Universal Few-shot Instance Perception with Point Representations

You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception

Pose for Everything: Towards Category-Agnostic Pose Estimation

Whole-Body Human Pose Estimation in the Wild

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

Multi-person Articulated Tracking with Spatial and Temporal Embeddings

Other papers

Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling

Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging Transformer

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition

TCFormer: Visual Recognition via Token Clustering Transformer

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

PROGRAM: PROtotype GRAph Model based Pseudo-Label Learning for Test-Time Adaptation

CLIM: Contrastive Language-Image Mosaic for Region Representation

Aligning Bag of Regions for Open-Vocabulary Object Detection

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

When Counterpoint Meets Chinese Folk Melodies

TRB: A Novel Triplet Representation for Understanding 2D Human Body

Robust Few-Shot Learning for User-Provided Data

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

Hierarchical Automatic Curriculum Learning: Converting a Sparse Reward Navigation Task into Dense Reward

Connectionist Temporal Classification with Maximum Entropy Regularization

Towards Multi-Person Pose Tracking: Bottom-up and Top-down Methods

Projects

MMPose Toolbox

ACM MM'2020 HiEve Challenge

CVPR'2018 Look Into Person (LIP) Challenge

ICCV'2017 PoseTrack Challenge

Patents

Key point detection method, device, electronic equipment and storage medium

Key point detection method, device, electronic equipment and storage medium

Image processing method and device, detection device and storage medium

Key point detection method, device, electronic equipment and storage medium

Image processing method and device, detection device and storage medium

Image processing method and device, detection device and storage medium

Deep learning model training method and device, training equipment and storage medium

Teaching

Activities

Contacts