Paolo Rota

Paolo Rota

Associate Professor @ CIMeC/DISI - University of Trento

Palazzo Fedrigotti - Room 216 - Rovereto
Polo Ferrari - Room 110 - Povo 2

If you are interested in working or visiting the MHUG group, please fill this form .

About me

I am Paolo Rota, associate professor at the University of Trento, working in computer vision, machine learning, and multimodal AI. My research focuses on vision-language models and activity recognition, with applications in video analytics and industrial AI.

Recently, I have been exploring topics such as zero-shot action recognition, temporal action localization, and vocabulary-free image classification, contributing to publications at conferences like CVPR, NeurIPS, and ICCV. I enjoy tackling challenges in open-world recognition and multimodal learning, always looking for ways to improve AI’s practical impact.

Outside of academia, I co-founded Mountain Maps, a startup that uses AI to enhance outdoor navigation and help people explore mountain environments more safely and enjoyably.

News

Shiyao’s paper has been accepted to 3DV 2026!

The work focuses on Motion Captioning, with the goal of developing a model that can describe not only what action is happening, but also when it occurs by accurately identifying its temporal boundaries, as the title: Dense Motion Captioning.

For more details, visit the project webpage: https://xusy2333.com/demo/

Current Students

Benedetta Liberatori
Benedetta Liberatori
Co-advised with Elisa Ricci
Vision and Language Video Understanding
Jiaqi Liu
Jiaqi Liu
Co-advised with Nicu Sebe
Image Generation Pose Transfer
Yan Shu
Yan Shu
Co-advised with Nicu Sebe
Remote Sensing Vision and Language
Shiyao Xu
Shiyao Xu
Co-advised with Gül Varol
3D Human Motion Understanding Vision and Language
Ester Riccardi
Ester Riccardi
Co-advised with Roberto Bottini
Biological-inspired Media Generation Brain decoding

Former Students

Alessandro Conti
Alessandro Conti
Giacomo Zara
Giacomo Zara

Recent Publications

Publication Image
Dense Motion Captioning
Shiyao Xu, Benedetta Liberatori, Gül Varol, Paolo Rota. 3DV 2026.
Publication Image
ConViS-Bench: Estimating Video Similarity Through Semantic Concepts
Benedetta Liberatori, Alessandro Conti, Lorenzo Vaquero, Yiming Wang, Elisa Ricci, Paolo Rota. NeurIPS 2025.
Publication Image
ImageNet-trained CNNs are not biased towards texture: Revisiting feature reliance through controlled suppression
Tom Burgert, Oliver Stoll, Paolo Rota, Begüm Demir. NeurIPS (Oral) 2025.
Publication Image
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti, Massimiliano Mancini, Enrico Fini, Yiming Wang, Paolo Rota, Elisa Ricci. ICCV 2025.
Publication Image
Automatic benchmarking of large multimodal models via iterative experiment programming
Alessandro Conti, Enrico Fini, Paolo Rota, Yiming Wang, Massimiliano Mancini, Elisa Ricci. ICIAP 2025.
Publication Image
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
Jiaqi Liu, Jichao Zhang, Paolo Rota, Nicu Sebe. CVPR 2025.
Publication Image
Simplifying Open-Set Video Domain Adaptation with Contrastive Learning
Giacomo Zara, Victor Guilherme Turrisi da Costa, Subhankar Roy, Paolo Rota, Elisa Ricci. CVIU 2024.
pdf
Publication Image
Text-Enhanced Zero-Shot Action Recognition: A Training-Free Approach
Massimo Bosetti, Shibingfeng Zhang, Benedetta Liberatori, Giacomo Zara, Elisa Ricci, Paolo Rota. ICPR 2024.
Publication Image
Test-time zero-shot temporal action localization
Benedetta Liberatori, Alessandro Conti, Paolo Rota, Yiming Wang, Elisa Ricci. CVPR 2024.
Publication Image
AutoLabel: CLIP-based framework for open-set video domain adaptation
Giacomo Zara, Subhankar Roy, Paolo Rota, Elisa Ricci. CVPR 2023.