News
- I became a Senior Research Scientist at NVIDIA Research.
- 5 papers have been accepted at CVPR 2026.
- 3 papers have been accepted at ICRA 2026.
- 2 papers have been accepted at NeurIPS 2025.
- 1 paper has been accepted at ICCV 2025.
- 2 papers have been accepted at CVPR 2025.
- 1 paper has been accepted at ICLR 2025.
- 1 paper has been accepted at WACV 2025.
- 1 paper has been accepted at EMNLP 2024.
- 1 paper has been accepted at ECCV 2024.
- 1 paper has been accepted at ICASSP 2024.
- 2 papers have been accepted at CVPR 2023.
- I joined NVIDIA Research in Taiwan as a Research Scientist.
|
|
Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception
Zhen Wan, Chao-Han Huck Yang, Jinchuan Tian, Hanrong Ye, Ankita Pasad, Szu-Wei Fu, Arushi Goel, Ryo Hachiuma, Shizhe Diao, Kunal Dhawan, Sreyan Ghosh, Yusuke Hirota, Zhehuai Chen, Rafael Valle, Ehsan Hosseini-Asl, Chenhui Chu, Shinji Watanabe, Yu-Chiang Frank Wang, Boris Ginsburg
arXiv, 2026
[paper]
|
|
VIOLA: Towards Video In-Context Learning with Minimal Annotations
Ryo Fujii, Hideo Saito, Ryo Hachiuma
arXiv, 2026
[paper]
|
|
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro, Yu-Chiang Frank Wang, Yueh-Hua Wu
arXiv, 2025
[paper]
|
|
Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in
Xiaoqian Shen, Min-Hung Chen, Yu-Chiang Frank Wang, Mohamed Elhoseiny, Ryo Hachiuma
arXiv, 2025
[paper]
|
|
Kinematics-Guided Reinforcement Learning for Object-Aware 3D Ego-Pose Estimation
Zhengyi Luo*, Ryo Hachiuma*, Ye Yuan, Shun Iwase, Kris M. Kitani (* equal contribution)
arXiv, 2020
[paper]
|
|
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models
Byung-Kwan Lee, Yu-Chiang Frank Wang, Ryo Hachiuma
CVPR, 2026
[paper]
|
|
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
Chiao-An Yang, Ryo Hachiuma, Sifei Liu, Subhashree Radhakrishnan, Raymond A. Yeh, Yu-Chiang Frank Wang, Min-Hung Chen
CVPR, 2026
[paper]
|
|
Interpretable Debiasing of Vision-Language Models for Social Fairness
Na Min An, Yoonna Jang, Yusuke Hirota, Ryo Hachiuma, Isabelle Augenstein, Hyunjung Shim
CVPR, 2026
[paper]
|
|
Autoregressive Universal Video Segmentation Model
Miran Heo, Sukjun Hwang, Min-Hung Chen, Yu-Chiang Frank Wang, Albert Gu, Seon Joo Kim, Ryo Hachiuma
CVPR (Findings), 2026
[paper]
|
|
Learning from Synthetic Data via Provenance-Based Input Gradient Guidance
Koshiro Nagano, Ryo Fujii, Ryo Hachiuma, Fumiaki Sato, Taiki Sekii, Hideo Saito
CVPR, 2026
[paper]
|
|
V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts
Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen, Stephen F. Smith
ICRA, 2026
[paper]
|
|
RealTraj: Towards Real-World Pedestrian Trajectory Forecasting
Ryo Fujii, Hideo Saito, Ryo Hachiuma
ICRA, 2026
[paper]
|
|
Unified Reinforcement and Imitation Learning for Vision-Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro, Yu-Chiang Frank Wang, Yueh-Hua Wu
NeurIPS, 2025
[paper]
|
|
Towards Predicting Any Human Trajectory In Context
Ryo Fujii, Hideo Saito, Ryo Hachiuma
NeurIPS, 2025
[paper]
|
|
Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Yusuke Hirota, Ryo Hachiuma, et al.
ICCV, 2025
[paper]
|
|
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang, Yong Man Ro, Yueh-Hua Wu
CVPR, 2025
[paper]
|
|
V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models
Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen
ICRA, 2026
[paper]
|
|
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang, Sifei Liu, Subhashree Radhakrishnan, Seon Joo Kim, Yu-Chiang Frank Wang, Ryo Hachiuma
CVPR, 2025
[paper]
|
|
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
Yusuke Hirota, Boyi Li, Ryo Hachiuma, Yueh-Hua Wu, Boris Ivanovic, Marco Pavone, Yejin Choi, Yu-Chiang Frank Wang, Yuta Nakashima, Chao-Han Huck Yang
ACL, 2025
[paper]
|
|
Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Reina Ishikawa, Ryo Fujii, Hideo Saito, Ryo Hachiuma
ICCVW, 2025
[paper]
|
|
SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma
ICLR, 2025
[paper]
|
|
CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting
Ryo Fujii, Ryo Hachiuma, Hideo Saito
WACV, 2025
[paper]
|
|
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
Yusuke Hirota, Ryo Hachiuma, Huck Yang, Yuta Nakashima
EMNLP, 2024
[paper]
|
|
Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
Masashi Hatano, Ryo Hachiuma, Ryo Fujii, Hideo Saito
ECCV, 2024
[paper]
|
|
Weakly Semi-Supervised Tool Detection in Minimally Invasive Surgery Videos
Ryo Fujii, Ryo Hachiuma, Hideo Saito
ICASSP, 2024
[paper]
|
|
EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos
Masashi Hatano, Ryo Hachiuma, Hideo Saito
ECCV Workshops, 2024
[paper]
|
|
Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
Ryo Hachiuma, Fumiaki Sato, Taiki Sekii
CVPR, 2023
[paper] [demo]
|
|
Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
Fumiaki Sato, Ryo Hachiuma, Taiki Sekii
CVPR, 2023
[paper]
|
|
Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation
Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Kris Kitani
NeurIPS, 2021
[paper]
|
|
Toward Unsupervised 3D Point Cloud Anomaly Detection using Variational Autoencoder
Mana Masuda, Ryo Hachiuma, Ryo Fujii, Hideo Saito, Yusuke Sekikawa
ICIP, 2021
[paper]
|
|
Deep Selection: A Fully Supervised Camera Selection Network for Surgery Recordings
Ryo Hachiuma, Tomohiro Shimizu, Hideo Saito, Hiroki Kajita, Yoshihumi Takatsume
MICCAI, 2020
[paper]
|
|
RGB-D Image Inpainting Using Generative Adversarial Network with a Late Fusion Approach
Ryo Fujii, Ryo Hachiuma, Hideo Saito
AVR, 2020
[paper]
|
|
DetectFusion: Detecting and Segmenting Both Known and Unknown Dynamic Objects in Real-time SLAM
Ryo Hachiuma, Christian Pirchheim, Dieter Schmalstieg, Hideo Saito
BMVC, 2019 (Spotlight)
[paper]
|
|
Joint Inpainting of RGB and Depth Images by Generative Adversarial Network with a Late Fusion Approach
Ryo Fujii, Ryo Hachiuma, Hideo Saito
ISMAR, 2019
[paper]
|
Academic Service
Awards:
- Honorable Mention Award, Essay Competition on the Social Impact of Computer Vision, ICVSS 2019
- Best Paper Award, Asia Pacific Workshop on Mixed and Augmented Reality (APMAR) 2019
Reviewer: CVPR, ICCV, SIGGRAPH, NeurIPS, MICCAI, ISMAR, IEEE VR, ICIP, IEEE Access, IPSJ Computer Vision and Applications
Competition Results:
|
Patents
- Recognition device, recognition system, and computer program — Ryo Hachiuma, Taiki Sekii. Konica Minolta Inc. US20260057672A1, 2026.
- Societal Attribute Neutralizer for Debiasing CLIP — Ryo Hachiuma, Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yu-Chiang Wang. NVIDIA Corp. US20260004562A1, 2026.
- Information processing device, information processing method, and program — Taiki Sekii, Fumiaki Sato, Ryo Hachiuma. Konica Minolta Inc. JP2025008875A, 2025.
- Learning data generation device, posture estimation device, and their method and program — Mariko Isogawa, Dan Mikami, Kana Yamamoto, Ryosuke Hori, Ryo Hachiuma, Hideo Saito. Keio University, NTT Inc. JP2022178909A, 2022.
- Posture estimating tool learning device, posture estimation device, method for the same, and program — Mariko Isogawa, Dan Mikami, Kana Yamamoto, Ryosuke Hori, Ryo Hachiuma, Hideo Saito. Keio University, NTT Inc. JP2022100569A, 2022.
|
|