About Me

Sheng Jin is currently a research scientist at SenseTime Research. His research focus is on teaching agents/robots to see and understand human behaviors such as human body poses, actions, and human-machine interactions.

In 2024, he received the PhD degree in the Department of Computer Science at the University of Hong Kong, advised by Prof. Ping Luo and co-supervised by Prof. Wenping Wang and Prof. Xiaoou Tang. In 2020, he received his master's degree in the Department of Automation at Tsinghua University, advised by Prof. Changshui Zhang. In 2017, he received the B.Eng. degree with highest honor (Outstanding Graduate Scholarships) from Tsinghua University.

Image

News


  • [2025-04] Three papers accepted to CVPR'2025 (1 Highlight and 2 Posters).

  • [2024-12] Two papers accepted to AAAI'2025.

  • [2024-09] One papers accepted to NeurIPS'2024.

  • [2024-07] Four papers accepted to ECCV'2024.

  • [2024-01] Two papers accepted to ICLR'2024 (1 Spotlight and 1 Poster).

  • [2023-12] One paper accepted to AAAI'2024.

  • [2023-03] One paper accepted to CVPR'2023.

  • [2022-08] One paper accepted to TPAMI'2022.

  • [2022-07] Three papers accepted to ECCV'2022 (1 Oral and 2 Posters).

  • [2022-04] One paper accepted to CVPR'2022 (Oral).

  • [2022-01] One paper accepted to ICLR'2022.

Education

Image
The University of Hong Kong

PhD in Computer Science (HKPFS awardee), 2020~2024
Image
Tsinghua University

MS in Control Science and Engineering, 2017~2020
Image
Tsinghua University

BSc in Automation (ranking 1/145), 2013~2017


Honors and Awards

  • YS and Christabel Lung Postgraduate Scholarship, 2020-2021.

  • HKU Presidential PhD Scholarship (HKU-PS) 2020-2024.

  • Hong Kong PhD Fellowships (HKPF), 2020-2024.

  • Outstanding Graduate of Beijing City, 2017.

  • Outstanding Graduate of Tsinghua University (top 1% in Tsinghua), 2017.

  • The Baosteel Excellent Student Scholarship, 2016.

  • Zheng Weimin Scholarship (2nd class) for Comprehensive Excellence, 2016.

  • Tsinghua-JJWorld (Beijing) Nework Technology Fellowships, Tsinghua University, 2015.

  • Tsinghua-Evergrande Fellowships for Academic Excellence, Tsinghua University, 2014.


Selected Publications

* means equal contributions.
Image
F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy

Conference on Computer Vision and Pattern Recognition (CVPR), 2025.

[Paper]  [Code]  
Image
NADER: Neural Architecture Design via Multi-Agent Collaboration

Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

Conference on Computer Vision and Pattern Recognition (CVPR), 2025.

[Paper]
Image
KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

Jie Yang, Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Ruimao Zhang

Conference on Neural Information Processing Systems (NeurIPS), 2024.

[Paper]
Image
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

AAAI Conference on Artificial Intelligence (AAAI), 2025.

[Paper]  [Code]  
Image
UniFS: Universal Few-shot Instance Perception with Point Representations

Sheng Jin*, Ruijie Yao*, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo

European Conference on Computer Vision (ECCV), 2024.

[Paper]  [Code & Data]
Image
You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception

Sheng Jin*, Shuhuai Li*, Tong Li, Wentao Liu, Chen Qian, Ping Luo

European Conference on Computer Vision (ECCV), 2024.

[Paper]  [Code & Data]
Image
Pose for Everything: Towards Category-Agnostic Pose Estimation

Lumin Xu*, Sheng Jin*, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang

European Conference on Computer Vision (ECCV), 2022, Oral.

[Paper]  [Code & Data]  [Blog(商汤学术)]   [Talk(OpenMMLab社区)]
Image
Whole-Body Human Pose Estimation in the Wild

Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo

European Conference on Computer Vision (ECCV), 2020.

[Paper] [Dataset]
Image
Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, and Ping Luo

European Conference on Computer Vision (ECCV), 2020.

[Paper] [Blog(知乎)]
Image
Multi-person Articulated Tracking with Spatial and Temporal Embeddings

Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian

Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[Paper] [Demo]


Other papers

Click to expand or collapse
Image
Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling

Haopeng Sun, Yingwei Zhang, Lumin Xu, Sheng Jin, Ping Luo, Chen Qian, Wentao Liu, Yiqiang Chen

Conference on Computer Vision and Pattern Recognition (CVPR), 2025, Highlight.

[Paper]
Image
Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging Transformer

Haopeng Sun, Yingwei Zhang, Lumin Xu, Sheng Jin, Yiqiang Chen

AAAI Conference on Artificial Intelligence (AAAI), 2025.

[Paper]
Image
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

European Conference on Computer Vision (ECCV), 2024.

[Paper]  [Code & Data]
Image
GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition

Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian, Ping Luo, Ji Wu

European Conference on Computer Vision (ECCV), 2024.

[Paper]  [Code]
Image
TCFormer: Visual Recognition via Token Clustering Transformer

Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024.

[Paper]  [Code]
Image
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy

International Conference on Learning Representations (ICLR), 2024, Spotlight.

[Paper]  [Code]  [Blog(商汤学术)]
Image
PROGRAM: PROtotype GRAph Model based Pseudo-Label Learning for Test-Time Adaptation

Haopeng Sun, Lumin Xu, Sheng Jin, Ping Luo, Chen Qian, Wentao Liu

International Conference on Learning Representations (ICLR), 2024.

Image
CLIM: Contrastive Language-Image Mosaic for Region Representation

Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy

AAAI Conference on Artificial Intelligence (AAAI), 2024.

[Paper]  [Code]  [Blog(商汤学术)]
Image
Aligning Bag of Regions for Open-Vocabulary Object Detection

Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy

Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

[Paper]  [Code]   [Project]  [Blog(商汤学术)]
Image
ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.

[Paper]  [Data]   [Talk(OpenMMLab社区)]
Image
PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation

Wentao Jiang, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Si Liu

European Conference on Computer Vision (ECCV), 2022.

[Paper]  [Code]  [Blog(商汤学术)]   [Talk(OpenMMLab社区)]
Image
3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

Hao Meng*, Sheng Jin*, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, Ping Luo

European Conference on Computer Vision (ECCV), 2022.

[Paper]  [Code & Data]   [Project]  [Blog(商汤学术)]   [Talk(OpenMMLab社区)]
Image
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Ouyang Wanli, Xiaogang Wang

Conference on Computer Vision and Pattern Recognition (CVPR), 2022, Oral.

[Paper]  [Code]  [Blog(商汤学术)]   [News(机器之心)]   [Talk(OpenMMLab社区)]
Image
Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization

Can Wang, Sheng Jin, Yingda Guan, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang

International Conference on Learning Representations (ICLR), 2022.

[Paper
Image
Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

Size Wu, Sheng Jin, Wentao Liu, Lei Bai, Chen Qian, Dong Liu, Wanli Ouyang

IEEE International Conference on Computer Vision (ICCV), 2021.

[Paper]  [Code]  [Blog(商汤学术)]  [Demo]
Image
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang

Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

[Paper]  [Code]   [Talk(OpenMMLab社区)]
Image
When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

Jiahang Wang, Sheng Jin, Wentao Liu, Weizhong Liu, Chen Qian, Ping Luo

Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

[Paper]  [Code]
Image
When Counterpoint Meets Chinese Folk Melodies

Nan Jiang, Sheng Jin, Zhiyao Duan, Changshui Zhang

Conference on Neural Information Processing Systems (NeurIPS), 2020.

[Paper] [Supplementary] [Poster] [Code] [Project Page]
Image
TRB: A Novel Triplet Representation for Understanding 2D Human Body

Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang

IEEE International Conference on Computer Vision (ICCV), 2019, Oral.

[Paper] [Dataset]
Image
Robust Few-Shot Learning for User-Provided Data

Jiang Lu, Sheng Jin, Jian Liang, and Changshui Zhang

IEEE Transactions on Neural Networks and Learning Systems (TNNLS).

[Paper
Image
RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

Nan Jiang, Sheng Jin, Zhiyao Duan, Changshui Zhang

AAAI Conference on Artificial Intelligence (AAAI), 2020, Oral.

[Paper] [Demo]
Image
Hierarchical Automatic Curriculum Learning: Converting a Sparse Reward Navigation Task into Dense Reward

Nan Jiang, Sheng Jin, Changshui Zhang

Neurocomputing, 2019.

[Paper
Image
Connectionist Temporal Classification with Maximum Entropy Regularization

Hu Liu, Sheng Jin, Changshui Zhang

Conference on Neural Information Processing Systems (NeurIPS), 2018, Spotlight.

[Paper] [Poster] [Code]
Image
Towards Multi-Person Pose Tracking: Bottom-up and Top-down Methods

Sheng Jin, Xujie Ma, Zhipeng Han, Yue Wu, Wei Yang, Wentao Liu, Chen Qian, Wanli Ouyang

International Conference on Computer Vision (ICCV) PoseTrack Workshop, 2017.

[Paper] [Leaderboard](BUTDS and BUTD2) [Demo]


Projects

Image
MMPose Toolbox

MMPose is an open-source toolbox for pose estimation based on PyTorch, which is a part of the OpenMMLab project.

[Project
Image
ACM MM'2020 HiEve Challenge

Our team (SimpleTrack) won the 3rd place in Track-3 "Crowd Pose Tracking in Complex Events" of ACM MM'2020 HiEve Challenge.

[Leaderboard] [Technical Report]
Image
CVPR'2018 Look Into Person (LIP) Challenge

Our team (MJDG) won the 2nd place in Track-4 "Multi-Human Pose Estimation Challenge" of CVPR'2018 LIP Challenge.

[Leaderboard] [Oral Presentation]  
Image
ICCV'2017 PoseTrack Challenge

Our team (BUTDS | BUTD2) won the 2nd places in both Track-1 "Single-Frame Person Pose Estimation" and Track-3 "Multi-Person Pose Tracking" of ICCV'2017 PoseTrack Challenge.

[Leaderboard] [Technical Report]  [Oral Presentation]  [Demo]


Patents

Click to expand or collapse
Key point detection method, device, electronic equipment and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN111898642A. Publication Date: 2020-11-06.
Key point detection method, device, electronic equipment and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN111783882A. Publication Date: 2020-10-16.
Image processing method and device, detection device and storage medium

Tong Li, Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN111539992A. Publication Date: 2020-08-14.
Key point detection method, device, electronic equipment and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN111444928A. Publication Date: 2020-07-24.
Image processing method and device, detection device and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN109948526A. Publication Date: 2019-06-28.
Image processing method and device, detection device and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN109934183A. Publication Date: 2019-06-25.
Deep learning model training method and device, training equipment and storage medium

Sheng Jin, Wentao Liu, Chen Qian

Chinese Invention Patent.

Publication Number: CN109919245A. Publication Date: 2019-06-21.


Teaching


  • TA, Deep Learning (COMP7606), HKU, [autumn, 2021]
  • TA, From Human Vision to Machine Vision (CCST9049), HKU, [spring, 2020]
  • TA, Introduction to Artificial Intelligence (40250182-0), THU, [spring, 2019]


  • Activities


  • Conference Reviewer/PC Member
    • NeurIPS'19-24, AAAI'19-24, ICML'20-25, CVPR'20-25, ICCV'21-25, ECCV'22-24, ICLR'21-25, WACV'21-24,
  • Journal Reviewer
    • Transactions on Pattern Analysis and Machine Intelligence (TPAMI), IEEE Transactions on Artificial Intelligence (TAI), Transactions on Image Processing (TIP), International Journal of Computer Vision (IJCV), IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Visualization and Computer Graphics (TVCG)
  • Website Chairs


  • Contacts

    js20 [at] connect.hku.hk | jinsheng13 [at] foxmail.com