ML Evaluation Standards

The aim of the workshop is to discuss and propose standards for evaluating ML research, in order to better identify promising new directions and to accelerate real progress in the field of ML research. The problem requires understanding the kinds of practices that add or detract from the generalizability or reliability of results reported, and incentives for researchers to follow best practices. We may draw inspiration from adjacent scientific fields, from statistics, or history of science. Acknowledging that there is no consensus on best practices for ML, the workshop will have a focus on panel discussions and a few invited talks representing a variety of perspectives. The call to papers will welcome opinion papers as well as more technical papers on evaluation of ML methods. We plan to summarize the findings and topics that emerged during our workshop in a short report.

Call for papers

We invite two types of papers – opinion papers (up to 4 pages) stating positions on the topics related to those listed above, and methodology papers (up to 8 pages excluding references) about evaluation in ML. These topics may include:

  • Establishing benchmarking standards for ML research
  • Reliable tools/protocols for benchmarking and evaluation
  • Understanding and defining reproducibility for machine learning
  • Meta analyses thoroughly evaluating existing claims across papers
  • Incentives for doing better evaluation and reporting results

Submission Site: https://cmt3.research.microsoft.com/SMILES2022

Speakers

Image Image Image Image
Thomas Wolf
Hugginface Inc.
Frank Schneider
University of Tübingen
Rotem Dror
University of Pennsylvania
James Evans
University of Chicago
Image Image Image
Melanie Mitchell
Sante Fe Institute
Katherine Heller
Google Brain
Corinna Cortes
Google Research NYC

Panels

Reproducibility and Rigor in ML

Image Image Image
Rotem Dror
University of Pennsylvania
Sara Hooker
Google Brain
Koustuv Sinha
Mila, McGill University
Image Image
Frank Schneider
University of Tübingen
Gaël Varoquaux
INRIA

Slow vs Fast Science

Image Image Image
Chelsea Finn
Stanford University
Michela Paganini
DeepMind
James Evans
University of Chicago
Image Image
Russel Poldrack
Stanford University
Oriol Vinyals
DeepMind

Incentives for Better Evaluation

Image Image Image
Corinna Cortes
Google Research NYC
Yoshua Bengio
Mila, Université de Montréal
John Langford
Microsoft Research
Image
Kyunghyun Cho
New York University

Organizers

Image Image Image
Stephanie Chan
DeepMind
Rishabh Agarwal
Google Brain
Xavier Bouthillier
Mila, Université de Montréal
Image Image
Caglar Gulcehre
DeepMind
Jesse Dodge
Allen Institute for AI

For any queries, please reach out to the organizers at ml-eval-iclr2022@googlegroups.com .