The First Workshop on

Video Generative Models: Benchmarks and Evaluation

CVPR 2026

Exploring Challenges and Opportunities in Evaluating and Benchmarking Video Generative Models

(TBD) June 3–7, 2026

Denver, Colorado, United States

About the Workshop

The rapid advancement of video generative models underscores the critical need for robust evaluation methodologies capable of rigorously assessing instruction adherence, physical plausibility, human fidelity, and creativity. However, prevailing metrics and benchmarks remain constrained, predominantly prioritizing semantic alignment while often overlooking subtle yet critical artifacts, such as structural distortions, unnatural motion dynamics, and weak temporal coherence, that persist even in state-of-the-art systems.

Therefore, the VGBE workshop seeks to pioneer next-generation evaluation methodologies characterized by fine-grained granularity, physical grounding, and alignment with human perception. By establishing multi-dimensional, explainable, and standardized benchmarks, we aim to bridge the gap between generation and assessment, thereby accelerating the maturation of video generative models and facilitating their reliable deployment in real-world applications.

Topics of Interest

We invite contributions on (but not limited to):


Novel Metrics and Evaluation Methods

  • Spatiotemporal & Causal Integrity: Quantifying motion realism, object permanence, and causal logic consistency over time.
  • Perceptual Quality Assessment: Learning-based metrics for detecting visual artifacts, hallucinations, and alignment with human subjectivity.
  • Explainable Automated Judges: Leveraging Multimodal LLMs (VLMs) for scalable, fine-grained, and interpretable critique.
  • Instruction Adherence Metrics: Rigorous evaluation of prompt fidelity, spatial conditioning, and complex constraint satisfaction.

Datasets and Benchmarks

  • Narrative & Multi-Shot Suites: Curated datasets assessing character persistence, scene transitions, and long-horizon consistency.
  • Physics-Grounded Challenge Sets: Scenarios isolating fluid dynamics, collisions, and kinematic anomalies to stress-test "World Simulators."
  • Human Preference Data: Large-scale, fine-grained annotations capturing multi-dimensional judgments (e.g., aesthetics vs. realism).
  • Standardized Protocols: Unified data splits and reproducible frameworks to ensure transparent and comparable benchmarking.

Developing video generative applications in vertical domains

  • Domain Adaptation & Personalization: Efficient fine-tuning and Low-Rank Adaptation (LoRA) strategies for specialized verticals (e.g., medical, cinematic).
  • Simulation for Embodied AI: Leveraging video generative models as world simulators for robotics perception, planning, and Sim2Real transfer.
  • Interactive & Human-in-the-Loop: User-centric frameworks incorporating iterative feedback for creative workflows and gaming.
  • Immersive 4D Generation: Lifting video diffusion priors to synthesize spatially consistent scenes and dynamic assets for AR/VR environments.
  • Deployment Efficiency: Optimizing inference latency, memory footprint, and cost for scalable industrial applications.

VGBE 2026 Challenges

Submissions will be evaluated on the test set using the metrics defined in the associated paper, with human evaluation conducted for each task as needed.

Competition Timeline

Competition starts TBD (Target: February 16, 2026)
Initial results release TBD (Target: March 20, 2026)
Submission deadline TBD (Target: April 12, 2026)
Challenge paper deadline TBD (Target: April 19, 2026)
Final results and winners announced TBD

Image-to-Video Generation Challenge (TBD)

  • Objective (To be updated): Evaluate the model's ability to synthesize high-fidelity videos given a text prompt and a input image. The focus is on measuring prompt following capabilities (instruction adherence) and the visual fidelity / realism of the generated videos.
  • Protocol: A subset of test samples will be released in Phase 1. Participants are required to submit model checkpoints and inference code. Final rankings will be determined by a combination of automated metrics and rigorous human evaluation.

Instruction-Guided Video Editing Challenge (TBD)

  • Objective (To be updated): Assess the model's precision in performing various editing tasks on a given input video. Key criteria include the accuracy of the edit according to the instruction and the preservation of temporal consistency in unedited regions.
  • Protocol: A subset of test samples will be released in Phase 1. Participants are required to submit model checkpoints and inference code. Performance will be evaluated via automated consistency metrics and human preference studies.

Keynote Speakers (Tentative)

Alan Bovik

Alan Bovik

University of Colorado Boulder

Professor

Ming-Hsuan Yang

Ming-Hsuan Yang

UC Merced & Google DeepMind

Professor,
Research Scientist

Jiajun Wu

Jiajun Wu

Stanford University

Assistant Professor

Wenhu Chen

Wenhu Chen

University of Waterloo
Vector Institute

Assistant Professor

Mike Zheng Shou

Mike Zheng Shou

National University of Singapore

Assistant Professor

Yan Wang

Yan Wang

NVIDIA Research

Research Scientist,
Tech Lead

Zhuang Liu

Zhuang Liu

Princeton University

Assistant Professor

Organizers

Shuo Xing

Shuo Xing

Texas A&M University

Mingyang Wu

Mingyang Wu

Texas A&M University

Siyuan Yang

Siyuan Yang

Texas A&M University

Shuangyu Xie

Shuangyu Xie

UC Berkeley

Kaiyuan Chen

Kaiyuan Chen

UC Berkeley

Chris Wei Zhou

Chris Wei Zhou

Cardiff University

Sicong Jiang

Sicong Jiang

Abaka AI

Zihan Wang

Zihan Wang

2077AI Research Foundation, Abaka AI

Jian Wang

Jian Wang

Snap Research

Lin Wang

Lin Wang

Nanyang Technological University

Jinyu Zhao

Jinyu Zhao

eBay

Soumik Dey

Soumik Dey

eBay

Yilin Wang

Yilin Wang

Google/YouTube

Pooja Verlani

Pooja Verlani

Google

Zhengzhong Tu

Zhengzhong Tu

Texas A&M University

Paper Submission

Important Dates

Submissions Open January 28, 2026, 08:00 AM UTC-0
Submissions Due March 10, 2026, 11:59 PM UTC-0
Author Notification TBD
Camera-Ready Due TBD

Submission Guidelines

We welcome the following types of submissions: short papers (2–4 pages), and full papers (5–8 pages). All submissions should follow the CVPR 2026 author guidelines.

Paper Tracks

Full Papers

  • Length: Up to 8 pages (excluding references)
  • Content: No appendix or supplementary material in the main PDF
  • Proceedings: Included in official CVPR proceedings
  • Scope: Full research contributions

Short Papers

  • Length: Up to 4 pages (excluding references)
  • Content: No appendix or supplementary material in the main PDF
  • Proceedings: Not included in official proceedings (Archived on website)
  • Scope: Work-in-progress and preliminary results

Workshop Schedule

One-Day
Schedule TBD