A Very Big Video Reasoning Suite
We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.
Data Engines
communicating_vessels
GitHub
Prompt
A system of 3 communicating vessels with equal-diameter vertical tubes is filled with water (water-like (low viscosity)), which appears blue in color. As shown in the initial frame, the liquid levels in the tubes are [43, 2, 54] cm respectively. Due to pressure differences between the tubes, the liquid begins to flow through the connecting channels at the bottom. The flow is governed by hydrostatic pressure equalization and damped by viscous resistance with coefficient k=3.64. As the liquid redistributes, the height differences gradually decrease, and the system evolves toward equilibrium. Eventually, through conservation of volume, all tubes reach the same final liquid level, which equals the average of the initial heights. Simulate this settling process from the initial unbalanced state to the final stable equilibrium.
First Frame
Last Frame
Video
select_next_figure_large_small_alternating_sequence
GitHub
Prompt
The scene has two separated areas: a top SEQUENCE area and a bottom CHOICES area. In the SEQUENCE area, the shapes are the same shape and the same color, and their sizes strictly alternate between LARGE and SMALL from left to right. First observe the size-alternation pattern and determine whether the next item should be LARGE or SMALL, then select the one correct option (out of 4) in the CHOICES area that continues the same shape, color, and large/small alternation pattern. Circle the correct option and show the full process step by step.
First Frame
Last Frame
Video
planar_warp_verification
GitHub
Prompt
Transform the blue grid by aligning its four corners to the four corners of the red quadrilateral. Apply a perspective transformation so the grid matches the red outline. Keep all background elements, colored dots, and gray patches unchanged. Output the transformed grid.
First Frame
Last Frame
Video
animal_matching
GitHub
Prompt
Colored animal faces are on the left side of the canvas, and dark outlines of animals are on the right side. Move each colored animal face to its matching outline via the shortest path.
First Frame
Last Frame
Video
color_triple_intersection_red
GitHub
Prompt
A Venn diagram of circles is shown. Identify the region that lies in all three of the first three circles (triple intersection) and color that region red. Do not change anything else.
First Frame
Last Frame
Video
Inference Results
View All Results
Mirror Reflection - Samples
00
01
02
03
04
Prompt
Loading...
Ground Truth
First
Final
Model Outputs
1/9
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V
Leaderboard
Reference
Strong Baseline
Proprietary
Open-source