Call for papers

Previous workshops: 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025

We're inviting submissions! If you're interested in potentially presenting a poster or giving a talk, please submit a short paper (or extended abstract) to CMT (link available soon) by April 21 at 11:59 PT, using this LaTeX template. We will notify authors about paper decisions by April 30. We encourage submissions for work that has already been accepted in other venues, as well as new, work-in-progress submissions. The paper must be at most 4 pages, including references (a 1- or 2-page extended abstract is also fine). Accepted papers will appear on this site. Since the papers in this workshop are at most 4 pages long, they can also be submitted to next year's CVPR. Submissions should be anonymous.

We are looking for work that involves vision and sound. For example, the following topics would be in scope:
  • Generative models for audio-visual signals.
  • Audio-visual self-supervised learning
  • Embodied audio-visual learning
  • Intuitive physics with sound
  • Audio-visual scene understanding
  • Sound-from-vision and vision-from-sound
  • Semi-supervised learning
  • Audio-visual navigation
  • Video-to-music alignment
  • Video editing and movie trailer generation
  • Material recognition
  • Sound localization
  • Audio-visual speech processing
  • Multimodal architectures

Organizers

Image
Andrew Owens
University of Michigan
Image
Jiajun Wu
Stanford
Image
Arsha Nagrani
Google
Image
Triantafyllos Afouras
Meta
Image
Ruohan Gao
Meta /
University of Maryland
Image
Hang Zhao
Tsinghua University
Image
Ziyang Chen
University of Michigan

Image
William Freeman
MIT/Google
Image
Andrew Zisserman
Oxford
Image
Kristen Grauman
UT Austin / Meta
Image
Antonio Torralba
MIT
Image
Jean-Charles Bazin
Meta