Ryo Hachiuma

I am a Senior Research Scientist at NVIDIA Research in Taiwan. My research interests include video understanding and vision-language models.

I received my Ph.D. in Engineering from Keio University in 2021, where I was advised by Prof. Hideo Saito. I have also been a visiting researcher at TU Graz and Carnegie Mellon University.

Email  /  CV  /  Google Scholar  /  GitHub  /  Twitter  /  LinkedIn  /  ResearchGate

Ryo Hachiuma

News

  • I became a Senior Research Scientist at NVIDIA Research.
  • 5 papers have been accepted at CVPR 2026.
  • 3 papers have been accepted at ICRA 2026.
  • 2 papers have been accepted at NeurIPS 2025.
  • 1 paper has been accepted at ICCV 2025.
  • 2 papers have been accepted at CVPR 2025.
  • 1 paper has been accepted at ICLR 2025.
  • 1 paper has been accepted at WACV 2025.
  • 1 paper has been accepted at EMNLP 2024.
  • 1 paper has been accepted at ECCV 2024.
  • 1 paper has been accepted at ICASSP 2024.
  • 2 papers have been accepted at CVPR 2023.
  • I joined NVIDIA Research in Taiwan as a Research Scientist.

Preprints

Image Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception
Zhen Wan, Chao-Han Huck Yang, Jinchuan Tian, Hanrong Ye, Ankita Pasad, Szu-Wei Fu, Arushi Goel, Ryo Hachiuma, Shizhe Diao, Kunal Dhawan, Sreyan Ghosh, Yusuke Hirota, Zhehuai Chen, Rafael Valle, Ehsan Hosseini-Asl, Chenhui Chu, Shinji Watanabe, Yu-Chiang Frank Wang, Boris Ginsburg
arXiv, 2026   [paper]
Image VIOLA: Towards Video In-Context Learning with Minimal Annotations
Ryo Fujii, Hideo Saito, Ryo Hachiuma
arXiv, 2026   [paper]
Image GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro, Yu-Chiang Frank Wang, Yueh-Hua Wu
arXiv, 2025   [paper]
Image Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in
Xiaoqian Shen, Min-Hung Chen, Yu-Chiang Frank Wang, Mohamed Elhoseiny, Ryo Hachiuma
arXiv, 2025   [paper]
Image Kinematics-Guided Reinforcement Learning for Object-Aware 3D Ego-Pose Estimation
Zhengyi Luo*, Ryo Hachiuma*, Ye Yuan, Shun Iwase, Kris M. Kitani  (* equal contribution)
arXiv, 2020   [paper]

Conference Papers

Image Masking Teacher and Reinforcing Student for Distilling Vision-Language Models
Byung-Kwan Lee, Yu-Chiang Frank Wang, Ryo Hachiuma
CVPR, 2026   [paper]
Image 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
Chiao-An Yang, Ryo Hachiuma, Sifei Liu, Subhashree Radhakrishnan, Raymond A. Yeh, Yu-Chiang Frank Wang, Min-Hung Chen
CVPR, 2026   [paper]
Image Interpretable Debiasing of Vision-Language Models for Social Fairness
Na Min An, Yoonna Jang, Yusuke Hirota, Ryo Hachiuma, Isabelle Augenstein, Hyunjung Shim
CVPR, 2026   [paper]
Image Autoregressive Universal Video Segmentation Model
Miran Heo, Sukjun Hwang, Min-Hung Chen, Yu-Chiang Frank Wang, Albert Gu, Seon Joo Kim, Ryo Hachiuma
CVPR (Findings), 2026   [paper]
Image Learning from Synthetic Data via Provenance-Based Input Gradient Guidance
Koshiro Nagano, Ryo Fujii, Ryo Hachiuma, Fumiaki Sato, Taiki Sekii, Hideo Saito
CVPR, 2026   [paper]
Image V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts
Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen, Stephen F. Smith
ICRA, 2026   [paper]
Image RealTraj: Towards Real-World Pedestrian Trajectory Forecasting
Ryo Fujii, Hideo Saito, Ryo Hachiuma
ICRA, 2026   [paper]
Image Unified Reinforcement and Imitation Learning for Vision-Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro, Yu-Chiang Frank Wang, Yueh-Hua Wu
NeurIPS, 2025   [paper]
Image Towards Predicting Any Human Trajectory In Context
Ryo Fujii, Hideo Saito, Ryo Hachiuma
NeurIPS, 2025   [paper]
Image Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Yusuke Hirota, Ryo Hachiuma, et al.
ICCV, 2025   [paper]
Image VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang, Yong Man Ro, Yueh-Hua Wu
CVPR, 2025   [paper]
Image V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models
Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen
ICRA, 2026   [paper]
Image Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang, Sifei Liu, Subhashree Radhakrishnan, Seon Joo Kim, Yu-Chiang Frank Wang, Ryo Hachiuma
CVPR, 2025   [paper]
Image LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
Yusuke Hirota, Boyi Li, Ryo Hachiuma, Yueh-Hua Wu, Boris Ivanovic, Marco Pavone, Yejin Choi, Yu-Chiang Frank Wang, Yuta Nakashima, Chao-Han Huck Yang
ACL, 2025   [paper]
Image Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Reina Ishikawa, Ryo Fujii, Hideo Saito, Ryo Hachiuma
ICCVW, 2025   [paper]
Image SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma
ICLR, 2025   [paper]
Image CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting
Ryo Fujii, Ryo Hachiuma, Hideo Saito
WACV, 2025   [paper]
Image From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
Yusuke Hirota, Ryo Hachiuma, Huck Yang, Yuta Nakashima
EMNLP, 2024   [paper]
Image Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
Masashi Hatano, Ryo Hachiuma, Ryo Fujii, Hideo Saito
ECCV, 2024   [paper]
Image Weakly Semi-Supervised Tool Detection in Minimally Invasive Surgery Videos
Ryo Fujii, Ryo Hachiuma, Hideo Saito
ICASSP, 2024   [paper]
Image EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos
Masashi Hatano, Ryo Hachiuma, Hideo Saito
ECCV Workshops, 2024   [paper]
Image Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
Ryo Hachiuma, Fumiaki Sato, Taiki Sekii
CVPR, 2023   [paper]  [demo]
Image Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
Fumiaki Sato, Ryo Hachiuma, Taiki Sekii
CVPR, 2023   [paper]
Image Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation
Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Kris Kitani
NeurIPS, 2021   [paper]
Image Toward Unsupervised 3D Point Cloud Anomaly Detection using Variational Autoencoder
Mana Masuda, Ryo Hachiuma, Ryo Fujii, Hideo Saito, Yusuke Sekikawa
ICIP, 2021   [paper]
Image Deep Selection: A Fully Supervised Camera Selection Network for Surgery Recordings
Ryo Hachiuma, Tomohiro Shimizu, Hideo Saito, Hiroki Kajita, Yoshihumi Takatsume
MICCAI, 2020   [paper]
Image RGB-D Image Inpainting Using Generative Adversarial Network with a Late Fusion Approach
Ryo Fujii, Ryo Hachiuma, Hideo Saito
AVR, 2020   [paper]
Image DetectFusion: Detecting and Segmenting Both Known and Unknown Dynamic Objects in Real-time SLAM
Ryo Hachiuma, Christian Pirchheim, Dieter Schmalstieg, Hideo Saito
BMVC, 2019  (Spotlight)   [paper]
Image Joint Inpainting of RGB and Depth Images by Generative Adversarial Network with a Late Fusion Approach
Ryo Fujii, Ryo Hachiuma, Hideo Saito
ISMAR, 2019   [paper]

Academic Service

Awards:

  • Honorable Mention Award, Essay Competition on the Social Impact of Computer Vision, ICVSS 2019
  • Best Paper Award, Asia Pacific Workshop on Mixed and Augmented Reality (APMAR) 2019

Reviewer: CVPR, ICCV, SIGGRAPH, NeurIPS, MICCAI, ISMAR, IEEE VR, ICIP, IEEE Access, IPSJ Computer Vision and Applications

Competition Results:

Patents

Website design from Jon Barron.