A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

gravity_physics
GitHub
Knowledge out-of-domain testset
A ball at height 19.5m with initial downward velocity 4.5 m/s falls under gravity 12.2 m/s² and bounces on ground with elasticity 0.80. Show the full trajectory with velocity arrows (direction and magnitude) updating throughout until the ball stops.
First Frame
Last Frame
shape_scale_then_outline
GitHub
Abstraction in-domain testset
Show the shape first changing size and then becoming outline-only. Both transformations should match the example pattern.
First Frame
Last Frame
grid_shortest_path
GitHub
Spatiality in-domain testset
The scene shows a 10x10 grid with a blue start square (containing a yellow circular agent) and a green end square. Starting from the blue start square, the agent can move to adjacent cells (up, down, left, right). The goal is to move the agent to the green end square along the shortest path.
First Frame
Last Frame
combined_objects_spinning
GitHub
Transformation training set
The scene shows 2 objects on the left side and dashed target outlines on the right side. The dashed target outlines remain completely stationary. For each object, first rotate it in place to match the orientation of its corresponding dashed target outline, then move it horizontally to the right so that it aligns exactly with and fits within its corresponding dashed target outline.
First Frame
Last Frame
color_addition
GitHub
Perception in-domain testset
Two colored circular balls start at separate positions. They move toward each other at equal speeds until fully overlapping and merging into one. The overlapping region and final merged ball show the additive color mixture of the two original colors.
First Frame
Last Frame

Inference Results

View All Results
Dot to Dot - Samples
00
01
02
03
04
Task Domains 1/5
Dot to Dot
Knowledge in-domain testset
Next Figure (Alternating Size)
Abstraction in-domain testset
Select Leftmost Shape
Spatiality out-of-domain testset
Move Objects to Targets
Transformation out-of-domain testset
Highlight Horizontal Lines
Perception out-of-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/9
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V

Leaderboard

Reference
Strong Baseline
Proprietary
Open-source
Human
Human
97.4%
#1
VBVR
VBVR-Wan2.2
68.5%
#2
Sora 2
Sora 2
54.6%
#3
Veo 3.1
Veo 3.1
48.0%
#4
Runway
Runway Gen-4 Turbo
40.3%
#5
Wan2.2
Wan2.2-I2V-A14B
37.1%
#6
Kling
Kling 2.6
36.9%
#7
LTX-2
LTX-2
31.3%
#8
CogVideoX
CogVideoX1.5-5B-I2V
27.3%
#9
HunyuanVideo
HunyuanVideo-I2V
27.3%
#9