A Very Big Video Reasoning Suite
We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.
Data Engines
circle_largest_numerical_value
GitHub
Prompt
The scene shows 6 numbers on a white canvas. First compare the numerical values of all numbers, then draw one red circle around the single largest number. Do not circle any other numbers. Show the complete circling process step by step.
First Frame
Last Frame
Video
return_to_correct_bin
GitHub
Prompt
Move each item into the bin that matches its color. Only move items, do not change anything else.
First Frame
Last Frame
Video
grid_color_sequence
GitHub
Prompt
The scene shows a 10x10 grid with a green start point, a red end point, and colored cells (orange, yellow, and blue). A purple circular agent is positioned at the green start point. The agent can move to adjacent cells (up, down, left, right). Starting from the green start point, the agent must visit the colored cells in order (orange, then yellow, then blue), taking the shortest path between each consecutive pair of colored cells. The agent is allowed to pass through the red end point when visiting the colored cells if needed. After visiting all colored cells in sequence, the agent must reach the red end point, also following the shortest path.
First Frame
Last Frame
Video
separate_objects_no_spin
GitHub
Prompt
The scene shows 2 objects on the left side and dashed target outlines on the right side. The dashed target outlines remain completely stationary. Move each object horizontally to the right so that it aligns exactly with and fits within its corresponding dashed target outline.
First Frame
Last Frame
Video
color_mixing
GitHub
Prompt
The scene has two colored light sources positioned on the left and right sides, and a mixing zone marked by a white rectangular border in the center. In additive color mixing (light mixing), when two lights overlap, their RGB components add together: result_R = min(color1_R + color2_R, 255), same for G and B, with each channel clamped to 255 maximum. First identify the RGB values of the left light (an RGB(69, 80, 31) colored light) and the right light (an RGB(92, 60, 102) colored light), then calculate the mixed color by adding their RGB components channel by channel. Fill the white-bordered mixing zone in the center with the resulting mixed color and show the full calculation process step by step.
First Frame
Last Frame
Video
Inference Results
View All Results
Ball Bounces - Samples
00
01
02
03
04
Prompt
Loading...
Ground Truth
First
Final
Model Outputs
1/9
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V
Leaderboard
Reference
Strong Baseline
Proprietary
Open-source