About Me
I am currently a Researcher at Kuaishou Technology, focusing on cutting-edge research in Computer Vision and Natural Language Processing. My current research interests include Multimodal Large Language Models (MLLMs), Formal Theorem Proving, and AI Agents.
I received my Ph.D. degree from the Multimedia and Human Understanding Group (MHUG) at the Department of Information Engineering and Computer Science, University of Trento, Italy, in 2022. I was supervised by Prof. Nicu Sebe and Dr. Bruno Lepri, with my thesis defense committee including Vittorio Murino, Zhengyou Zhang, and Elisa Ricci.
Before my doctoral studies, I earned my B.Eng. degree in Photogrammetry and Remote Sensing (2015) and M.Eng. degree in Pattern Recognition and Intelligent System (2018) from Wuhan University, China.
We are actively recruiting daily interns for long-term positions. Please feel free to submit your resume to my email for exciting research opportunities!
Research Experience
Research focus: MLLMs, Formal Theorem Proving and AI Agents
Research focus: Image Generation and Enhancing (GANs and Diffusion Models)
Mentors: Dr. Linchao Bao and Dr. Wei Bi.
Research focus: GANs, Image Domain Translation
Mentors: Prof. Nicu Sebe and Dr. Bruno Lepri.
Research focus: Deep learning, GANs, Cross-modal Representations, Image Domain Translation
Mentors: Dr. Wei Bi and Dr. Xiaojiang Liu.
Research focus: Deep Learning, Neural Dialogue Generation
Mentor: Prof. Jian Yao.
Research focus: Deep Learning, Remote Sensing
Selected Publications
Conference Papers
DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing
Qian Cao†, Yahui Liu†, Wei Bi, Yi Zhao, Ruihua Song, Xiting Wang, Ruiming Tang, Guorui Zhou, Han Li
Annual Meeting of the Association for Computational Linguistics (ACL), 2026
All-in-One Slider for Attribute Manipulation in Diffusion Models
Weixin Ye, Hongguang Zhu, Wei Wang, Yahui Liu, Mengyu Wang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
Evaluating Text Creativity Across Diverse Domains: A Dataset and Large Language Model Evaluator
Qian Cao, Xinting Wang, Yuzhou Yuan, Yahui Liu, Fang Luo, Ruihua Song
International Conference on Learning Representations (ICLR), 2026
Not Just What's There: Enabling CLIP to Comprehend Negated Visual Descriptions Without Fine-tuning
Junhao Xiao, Zhiyu Wu, Hao Lin, Yahui Liu*, Xiaoran Zhao, Zixu Wang, Zejiang He, Yi Chen
AAAI Conference on Artificial Intelligence (AAAI), 2026
What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning
Gangwei Jiang*, Yahui Liu*, Zhaoyi Li, Qi Wang, Fuzheng Zhang, Linqi Song, Ying Wei, Defu Lian.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search
Linhao Yu, Xinguang Ji, Yahui Liu, Fanheng Kong, Chenxi Sun, Jingyuan Zhang, Hongzhi Zhang, V. W., Fuzheng Zhang, Deyi Xiong
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
Bin Ren*, Yahui Liu*, Yue Song, Wei Bi, Rita Cucchiara, Nicu Sebe and Wei Wang (*equal contribution)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
Efficient Training of Visual Transformers with Small Datasets
Yahui Liu, Enver Sangineto, Wei Bi, Nicu Sebe, Bruno Lepri, Marco De Nadai
Advances in Neural Information Processing Systems (NeurIPS), 2021
Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation
Yahui Liu, Enver Sangineto, Yajing Chen, Linchao Bao, Haoxian Zhang, Nicu Sebe, Bruno Lepri, Wei Wang, Marco De Nadai.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach
Yahui Liu, Marco De Nadai, Deng Cai, Huayang Li, Xavier Alameda-Pineda, Nicu Sebe, and Bruno Lepri
ACM International Conference on Multimedia (ACM MM), 2020
Journal Articles
A Unified Masked Jigsaw Puzzle Framework for Vision and Language Models
Weixin Ye, Wei Wang, Yahui Liu, Yue Song, Bin Ren, Wei Bi, Rita Cucchiara and Nicu Sebe
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Spatial Entropy as An Inductive Bias for Vision Transformers
Elia Peruzzo, Enver Sangineto, Yahui Liu, Marco De Nadai, Wei Bi, Bruno Lepri, Nicu Sebe
Machine Learning, 2024 (Impact Factor: 5.8)
ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image Translation
Yahui Liu, Yajing Chen, Linchao Bao, Nicu Sebe, Bruno Lepri, Marco De Nadai
IEEE Transactions on Multimedia (TMM), 2022 (Impact Factor: 8.4)
DeepCrack: A Deep Hierarchical Feature Learning Architecture for Crack Segmentation
Yahui Liu, Jian Yao, Rengping Xie, and Li Li
Neurocomputing, 2019 (Impact Factor: 5.5)
RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes From High-Resolution Remotely Sensed Images
Yahui Liu, Jian Yao, Xiaohu Lu, Menghan Xia, Xingbo Wang, and Yuan Liu
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2019 (Impact Factor: 7.5)
Recent News
April 2026
DPWriter was accepted to ACL 2026.
Juanary 2026
CrEval was accepted to ICLR 2026.
November 2025
One paper was accepted to TPAMI.
Oct 2026
One paper was accepted to AAAI 2026.
June 2025
We released SeqPE for universal positional encoding.
May 2025
We released LCoT2Tree for uncovering structural patterns in Long CoT, which was accepted to EMNLP.
May 2025
We released the UNITE framework for Multimodal Information Retrieval.
May 2025
One paper accepted to ACL main conference: MCTS-VCB.
April 2025
We released Leanabell-Prover achieving the SOTA 59.8% pass@32 on MiniF2F-test.