A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

Data Engines

VIEW ALL DATA ENGINE

circle_largest_numerical_value

GitHub

Knowledge out-of-domain testset

Prompt

The scene shows 6 numbers on a white canvas. First compare the numerical values of all numbers, then draw one red circle around the single largest number. Do not circle any other numbers. Show the complete circling process step by step.

First Frame

Last Frame

Video

return_to_correct_bin

GitHub

Abstraction training set

Prompt

Move each item into the bin that matches its color. Only move items, do not change anything else.

First Frame

Last Frame

Video

grid_color_sequence

GitHub

Spatiality training set

Prompt

The scene shows a 10x10 grid with a green start point, a red end point, and colored cells (orange, yellow, and blue). A purple circular agent is positioned at the green start point. The agent can move to adjacent cells (up, down, left, right). Starting from the green start point, the agent must visit the colored cells in order (orange, then yellow, then blue), taking the shortest path between each consecutive pair of colored cells. The agent is allowed to pass through the red end point when visiting the colored cells if needed. After visiting all colored cells in sequence, the agent must reach the red end point, also following the shortest path.

First Frame

Last Frame

Video

separate_objects_no_spin

GitHub

Transformation out-of-domain testset

Prompt

The scene shows 2 objects on the left side and dashed target outlines on the right side. The dashed target outlines remain completely stationary. Move each object horizontally to the right so that it aligns exactly with and fits within its corresponding dashed target outline.

First Frame

Last Frame

Video

color_mixing

GitHub

Perception training set

Prompt

The scene has two colored light sources positioned on the left and right sides, and a mixing zone marked by a white rectangular border in the center. In additive color mixing (light mixing), when two lights overlap, their RGB components add together: result_R = min(color1_R + color2_R, 255), same for G and B, with each channel clamped to 255 maximum. First identify the RGB values of the left light (an RGB(69, 80, 31) colored light) and the right light (an RGB(92, 60, 102) colored light), then calculate the mixed color by adding their RGB components channel by channel. Fill the white-bordered mixing zone in the center with the resulting mixed color and show the full calculation process step by step.

First Frame