Video AI Symposium

30 Sep - 1 Oct 2023

Google DeepMind Offices, London

Image

About

We are building on a great tradition of video understanding symposiums/summits previously held in Europe (2019, 2022) and the US (2019, 2017) - inviting you to come together for a much-needed discussion on next steps for video understanding. Despite tremendous efforts and an increase in the number and scale of video datasets, video understanding remains a bottleneck for research.

We thus invite you for a 1.5-days invitation-only closed event, to exchange ideas, form research directions, establish tight collaborations and recommend new research directions.

The Video AI Symposium will be held in Central London, at the Google DeepMind offices on Saturday 30 Sep and Sunday 1 Oct 2023. The event will bring together 50 researchers for an opportunity to exchange ideas and connect over a mutual interest in video understanding.

Sponsored by: Google DeepMind, Google Research and Meta AI

Confirmed Attendees

Image
Rahul Sukthankar

Google Research

Image
Cordelia Schmid

INRIA, Google Research

Image
Andrew Zisserman

University of Oxford, Google DeepMind

Image
Jitendra Malik

UC Berkeley and Meta AI

Image
William Freeman

MIT and Google Research

Image
Cees Snoek

University of Amsterdam

Image
Alexei Efros

UC Berkeley

Image
Ivan Laptev

INRIA Paris

Image
Michal Irani

Weizmann Institute

Image
Andrea Vedaldi

University of Oxford and Meta

Image
Kristen Grauman

Meta AI and UT Austin

Image
Joao Carreira

Google DeepMind

Image
Dima Damen

University of Bristol and Google DeepMind

Image
Joseph Tighe

Meta AI

Image
Juan Carlos Niebles

Salesforce - Stanford University

Image
Efstratios Gavves

University of Amsterdam

Image
Juergen Gall

University of Bonn

Image
Carl Vondrick

Columbia University

Image
Hilde Kuehne

University of Bonn and MIT-IBM Watson AI Lab

Image
Thomas Kipf

Google DeepMind

Image
Angela Yao

Singapore University

Image
Limin Wang

Nanjing University

Image
David Fouhey

New York University

Image
Michael S. Ryoo

Stony Brook University and Google DeepMind

Image
Andrew Owens

University of Michigan

Image
Ishan Misra

Meta AI

Image
Gul Varol

École des Ponts ParisTech

Image
Carl Doersch

Google DeepMind

Image
Rohit Gridhar

Meta AI

Image
Adam Harley

Stanford University

Image
Arsha Nagrani

Google Research

Image
Jean-Baptiste Alayrac

Google DeepMind

Image
Weidi Xie

Shanghai Jiao Tong University

Image
Laura Sevilla

University of Edinburgh

Image
Tengda Han

University of Oxford

Image
Toby Perrett

University of Bristol

Image
Yuki Asano

University of Amsterdam

Image
Hazel Doughty

University of Leiden

Image
Ankush Gupta

Google DeepMind

Image
Viorica Patraucean

Google DeepMind

Image
Olivier Henaff

Google DeepMind

Image
Dilara Gokay

Google DeepMind

Image
Adria Recasens

Google DeepMind

Image
Yi Yang

Google DeepMind

Image
Ross Goroshin

Google DeepMind

Image
Karel Lenc

Google DeepMind

Image
Skanda Koppula

Google DeepMind

Image
Mateusz Malinowski

Google DeepMind

Image
Chuhan Zhang

Google DeepMind

Image
Daniel Zoran

Google DeepMind

Image
Yusuf Aytar

Google DeepMind

Image
Pauline Luc

Google DeepMind

Organisers

Local Organisation and Host

Advisors

Program

Saturday 30 Sep
08:45-09:30 Breakfast                                    
09:30-09:45 Introduction                                    
09:45-10:45 Rohit Girdhar Learning visual representations with minimal supervision                                    
  Limin Wang Towards building video foundation models                                    
  Christoph Feichtenhofer Self-Supervised Video Understanding                                    
10:45-11:15 Open Discussion How to learn from Raw Videos?                                    
11:15-11:45 Coffee Break                                    
11:45-12:45 Kristen Grauman See What I See and Hear What I Hear: First-Person Perception and the Future of AR and Robotics                                
  Michael Ryoo Video Representations for Robot Learning                                    
  Arsha Nagrani How can LLMs help with video understanding?                                    
12:45-13:30 Lunch                                    
13:30-14:10 Bill Freeman Watching videos out of the corner of your eye                                    
  Philipp Krähenbühl Towards faster video models                                    
  Stratis Gavves Causal Computer Vision towards Embodied General Intelligence                              
14:10-14:40 Open Discussion One dataset to solve it all - from tiktok to robotics                                    
14:40-15:00 Coffee Break                                    
15:00-16:00 Cordelia Schmid Dense video captioning and beyond                                    
  Cees Snoek Towards Human-Aligned Video-AI                                    
  Dima Damen Should we still seek fine-grained perception in video?                                    
16:00-16:20 Coffee Break                                    
16:20-17:00 Carl Vondrick System 2 and Video                                    
17:00-17:45 Open Discussion The crisis of downstream tasks... Are current benchmarks a good measure of research progress?                                
19:00-22:00 Dinner                                    
Sunday 1 October
08:45-09:30 Breakfast                                    
09:30-10:30 Joseph Tighe A new benchmark for an embodied AI assistant                                    
  Adam Harley Tracking Any Pixel in a Video                                    
  Carl Doersch Tracking Any Point                                    
10:30-10:45 Coffee Break                                    
10:45-11:45 Gul Varol Beyond Text Queries for Search: Composed Video Retrieval                                    
  Andrew Owens Multimodal Learning from the Bottom Up                                    
  Jitendra Malik Unsolved problems in video understanding                                    
11:45-12:00 Coffee Break                                    
12:00-12:40 Angela Yao VideoQA in the Time of Large Language Models                                    
  Laura Sevilla Lara Video Understanding Using Less Compute and Less Training Data                                    
12:40-13:15 Open Discussion Camera view vs world view - should video be studied in 3D                                    
13:15-14:30 Lunch and Prep to Leaving to Train                                    
14:30- 10 mins walk to St Pancras Station to take Eurostar Train                                    
16:31-19:47 Eurostar Train to Paris Gare du Nord (Dep-Arr)