I am an Applied Scientist at Amazon Ring in Sunnyvale, CA. Previously, I worked as a Senior Research Scientist at NVIDIA Research and a Senior Research SDE at Microsoft AI R&D Center in Taiwan. I had 8+ years of experience specializing in computer vision research, deep learning-based model optimization, and machine learning service integration. My research focus is mainly on cross-modality representation learning, face modeling, and 2D/3D scene understanding. Interested in revolutionizing a machine learning system from the bottom‑up, devising better problem‑solving methods for challenging tasks, and learning new technologies and tools if the need arises.
I'm interested in computer vision and multi-modal representation learning.
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL Yu Heng Hung,
Kai-Jie Lin,
Yu-Heng Lin,
Cheng Sun,
Chien-Yi Wang*,
Ping-Chun Hsieh*
  (*=equal advising)
International Conference on Machine Learning (ICML) AutoRL Workshop, 2024   (Spotlight)International Conference on Learning Representations (ICLR) , 2025  
OpenReview
we present a generalized deep Q-learning framework and propose BOFormer, which substantiates the framework for Multi-Objective Bayesian Optimization (MOBO) via sequence modeling.
We presented DoRA, a new parameter-efficient fine-tuning approach, which consistently outperforms LoRA in fine-tuning LLM without incurring additional inference costs. These improvements are particularly notable for smaller ranks with 37.2% improvement over LoRA for rank 8 and 22.4% improvement for rank 4.
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes Bor-Shiun Wang,
Chien-Yi Wang*,
Wei-Chen Chiu*
  (*=equal advising)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024  
Project
/
arXiv
/
Code
We propose the Multi-Level Concept Prototypes Classifier (MCPNet), an inherently interpretable model, that can explain a classifier's prediction via concept prototypes from low-to-high layers in the CNN models.
Probabilistic 3D Multi-Object Cooperative Tracking for Autonomous Driving via Differentiable Multi-Sensor Kalman Filter Hsu-Kuang Chiu,
Chien-Yi Wang,
Min-Hung Chen,
Stephen F. Smith IEEE International Conference on Robotics and Automation (ICRA), 2024  
Project
/
arXiv
/
Code
We propose a Differentiable Multi-Sensor Kalman Filter for 3D Multi-Object Cooperative Tracking (DMSTrack), which is designed to be capable of estimating observation noise covariance of each detection from different Connected Autonomous Vehicles (CAVs) to better take advantage of the Kalman Filter’s theoretical optimality property.
We introduce Rapper, a two-stage Reinforced RationalePrompted Paradigm for Natural Language Explanation (NLE) in Visual Question Answering (VQA).
Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation Fu-En Yang,
Chien-Yi Wang,
Yu-Chiang Frank Wang IEEE International Conference on Computer Vision (ICCV), 2023  
arXiv
To leverage robust representations from large-scale models while enabling efficient model personalization for heterogeneous clients, we propose a novel personalized FL framework of client-specific Prompt Generation (pFedPG), which learns to deploy a personalized prompt generator at the server for producing client-specific visual prompts that efficiently adapts frozen backbones to local data distributions.
QuAVF: Quality-aware Audio-Visual Fusion for Ego4D Talking to Me Challenge Hsi-Che Lin,
Chien-Yi Wang,
Min-Hung Chen,
Szu-Wei Fu,
Yu-Chiang Frank Wang IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2023  
1st Place Winner @ Ego4D Talking-To-Me (TTM) track arXiv
We propose the Quality-aware audio-visual fusion (QuAVF) framework which achieves 67.4% mean average precision (mAP) on the Ego4D talking-to-me (TTM) test set.
We propose Geometry-Aware Interaction Network (GAIN), which exploits dense facial landmarks with spatio-temporal graph convolutional network (ST-GCN) to establish a more interpretable and modularized Face Anti-Spoofing model.
RGB-D Face Recognition with Identity-Style Disentanglement and Depth Augmentation Meng-Tzu Chiu,
Hsun-Ying Cheng,
Chien-Yi Wang,
Shang-Hong Lai IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM), 2023  
paper
We propose to augment facial segmentation and depth maps to assist the RGB-D face recognition task. With the multi-modal augmentation and identity-style disentanglement, the proposed RGB-D recognition model could achieve superior performance on several benchmarks.
MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition Fu-En Wang,
Chien-Yi Wang,
Min Sun,
Shang-Hong Lai 37th AAAI Conference on Artificial Intelligence (AAAI), 2023  
arXiv
/
code
We propose the MixFair Adapter to determine and reduce the identity bias of training samples. Besides, in order to push for ultimate fairness in face recognition, we propose a new evaluation protocol to fairly evaluate the fairness performance of different approaches.
Generalized Face Anti-Spoofing via Multi-Task Learning and One-Side Meta Triplet Loss Chu-Chun Chuang,
Chien-Yi Wang,
Shang-Hong Lai IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2023  
arXiv
We introduce a multi-task meta-learning framework for learning more generalized features for face anti-spoofing.
PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition Chien-Yi Wang,
Yu-Ding Lu,
Shang-Ta Yang,
Shang-Hong Lai IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022  
arXiv
/
paper
/
supp
/
video
We propose PatchNet which reformulates face anti-spoofing as a fine-grained patch-type recognition problem.
Local-Adaptive Face Recognition via Graph-based Meta-Clustering and Regularized Adaptation Chien-Yi Wang*,
Wenbin Zhu*,
Kuan Lun Tseng,
Shang-Hong Lai,
Baoyuan Wang
(*=equal contribution)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022  
arXiv
/
paper
/
supp
/
video
We introduce a new problem setup called "local-adaptive face recognition (LaFR)" and proposed the clustering and adaptation modules to address face recognition in unseen environments.
FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition Chih-Ting Liu*,
Chien-Yi Wang*,
Shao-Yi Chien,
Shang-Hong Lai
  (*=equal contribution)
36th AAAI Conference on Artificial Intelligence (AAAI), 2022   (Oral) arXiv
/
code (coming soon)
We propose a Federated Learning based framework called FedFR to improve the generic face representation in a privacy-aware manner. Besides, the framework jointly optimizes personalized models for the corresponding clients via the proposed Decoupled Feature Customization module.
Disentangled Representation with Dual-stage Feature Learning for Face Anti-spoofing Yu-Chun Wang,
Chien-Yi Wang,
Shang-Hong Lai IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022  
arXiv
/
paper
/
supp
We propose a novel dual-stage disentangled representation learning method that can efficiently untangle spoof-related features from irrelevant ones. Unlike previous FAS disentanglement works with one-stage architecture, we found that the dual-stage training design can improve the training stability and effectively encode the features to detect unseen attack types.
High-Accuracy RGB-D Face Recognition via Segmentation-Aware Face Depth Estimation and Mask-Guided Attention Network Meng-Tzu Chiu,
Hsun-Ying Cheng,
Chien-Yi Wang,
Shang-Hong Lai IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2021   (Oral) arXiv
/
poster
/
video
We propose to leverage pseudo facial segmentation and depth maps to assist the RGB-D face recognition task. With the multi-modal augmentation, the proposed mask-guided RGB-D recognition model could achieve superior performance on several benchmarks.
We propose a unified representation learning framework to address the Cross Model Compatibility (CMC) problem in the context of visual search applications. The method can be applied onto face recognition and person re-identification tasks.
We propose a 3D dynamic scene analysis framework as the first step toward driving scene understanding. Specifically, given a sequence of synchronized 2D and 3D sensory data, the framework systematically integrates different perception modules to obtain 3D position, orientation, velocity and category of traffic participants and the ego car in a reconstructed 3D semantically labeled traffic scene.
Robust Image Segmentation Using Contour-Guided Color Palette Xiang Fu,
Chien-Yi Wang,
Chen Chen,
Changhu Wang,
C.-C. Jay Kuo Proceedings of IEEE International Conference on Computer Vision (ICCV), 2015  
paper
/
code
The contour-guided color palette (CCP) is proposed for robust image segmentation. It efficiently integrates contour and color cues of an image.