A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

Data Engines

View All
identify_chinese_character
GitHub
Knowledge out-of-domain testset
Find and circle the Chinese character among the displayed characters. Only one character is Chinese. Draw a red circle around it.
First Frame
Last Frame
shape_outline_fill
GitHub
Abstraction in-domain testset
Complete the A:B :: C:? shape-style analogy. Show how the right shape in the second row changes its fill or outline so that it follows the same style transformation used between the first two shapes.
First Frame
Last Frame
grid_go_through_block
GitHub
Spatiality in-domain testset
The scene shows a 10x10 grid with a green start square (containing an orange circular agent), a red end square, and multiple purple and yellow rectangular blocks. Starting from the green start square, the agent can move to adjacent cells (up, down, left, right). The goal is to move the agent to the red end square along the shortest path that passes through all purple and yellow blocks (the agent must visit every purple and yellow block before reaching the red end square).
First Frame
Last Frame
track_object_movement
GitHub
Transformation in-domain testset
The object marked with a green border is the only object that moves. It moves horizontally to align directly below the object with a red star at its center. Track the movement with the green border as the object moves.
First Frame
Last Frame
color_triple_intersection_red
GitHub
Perception out-of-domain testset
A Venn diagram of circles is shown. Identify the region that lies in all three of the first three circles (triple intersection) and color that region red. Do not change anything else.
First Frame
Last Frame

Inference Results

View Full Bench
Glass Refraction - Samples
00
01
02
03
04
Task Domains 1/5
Glass Refraction
Knowledge in-domain testset
Next Figure (Alternating Size)
Abstraction in-domain testset
Locate Topmost Figure
Spatiality out-of-domain testset
Symbol Edit
Transformation out-of-domain testset
Locate Overlapping Point
Perception out-of-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V
Seedance 2.0

Leaderboard

Modality
Split
Type
Category
2026-04-28