I am a second-year PhD student in Computer Science at Cornell Tech, Cornell University, working with Andrew Owens. Prior to that, I was a master's student at the University of Michigan (UMich). Feel free to contact me.
We introduce UniTouch, a unified tactile representation for vision-based tactile sensors aligned with multiple modalities. We show we can now use powerful models trained on other modalities (e.g. CLIP, LLM) to conduct tactile sensing tasks zero shot.
We learn several feature sets in a self-supervised manner by using audio-visual synchronization task and utilize autoregressive model to do anomaly detection on top of each feature set for video forensics detection.
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
Eric Zhongcong Xu,
Zeyang Song,
Satoshi Tsutsui,
Chao Feng,
Mang Ye,
Mike Zheng Shou,
ACM Multimedia, 2022
project page
/
arXiv
/
code
We create the AVA Audio-Visual Diarization (AVA-AVD) dataset to develop diarization methods for in-the-wild videos.
Service
CVPR 2022/2024, WACV 2023, ACM MM 2023, ICCV 2023, ECCV 2024, NeurIPS 2024, ICRA 2025, ICLR 2025, AISTATS 2025, TPAMI.