Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Sampled Videos

Dataset Download

To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for VBench evaluation. You can download them on Google Drive.

What Videos Do We Provide?

What's the potential usage of these videos:

  • Further labeling on video quality
  • For Instruction Tuning, using our videos and our human preference labels

Below is the folder structure of different models' sampled videos:

t2v_sampled_videos
├── per_dimension
│   ├── cogvideo.zip
│   ├── gen-2-all-dimension.tar.gz
│   ├── lavie.zip
│   ├── modelscope.zip
│   ├── opensora.tar
│   ├── pika-all-dimension.zip
│   ├── show-1.tar.gz
│   ├── videocrafter-1.tar.gz
│   ├── videocrafter-2.tar
│   └── videocrafter-09.zip
└── per_category
    ├── cogvideo.zip
    ├── gen-2-all-category.tar.gz
    ├── lavie.zip
    ├── modelscope.zip
    ├── pika-all-category.zip
    ├── show-1.tar.gz
    ├── videocrafter-0.9.zip
    └── videocrafter-1.zip

How to Download the Videos?

You can utilize gdown to download from Google Drive. Below is an example:

  • First, install gdown:
pip install gdown
  • Then, download zip file using gdown:
gdown --id <file_id> --output <output_filename>

# Example for videocrafter-1
gdown --id 1FCRj48-Yv7LM7XGgfDCvIo7Kb9EId5KX --output videocrafter-1.tar.gz

What are the Details of the Video Generation Models?

We list the setting for sampling videos from these models.

Model Evaluation Party Release Time Resolution FPS Frame Count Video Length Checkpoint Code Commit ID Video Format Sampled Videos (Dimension) Sampled Videos (Category) Other Settings                   
LaVie VBench Team 2023-09-26 512x512 8 16 2.0s - - MP4 Google Drive Google Drive
LaVie-Interpolation VBench Team 2023-09-26 512x512 24 61 2.5s link - MP4 Google Drive -
ModelScope VBench Team 2023-08-12 256x256 8 16 2.0s link - MP4 Google Drive Google Drive
CogVideo VBench Team 2022-05-29 480x480 10 33 3.3s link - GIF Google Drive Google Drive
VideoCrafter-0.9 VBench Team 2023-04-05 256x256 8 16 2.0s link Commit ID MP4 Google Drive Google Drive
VideoCrafter-1.0 VBench Team 2023-10-30 1024x576 10 16 1.6s link Commit ID MP4 Google Drive Google Drive
Show-1 VBench Team 2023-09-27 576x320 8 29 3.6s link Commit ID MP4 Google Drive Google Drive
Gen-2 VBench Team 2023-06-07 1408x768 24 96 4.0s - - MP4 Google Drive Google Drive
Pika VBench Team 2023-06-29 1088x640 24 72 3.0s - - MP4 Google Drive Google Drive
Open-Sora VBench Team 2024-03-18 512x512 8 16 2.0s link Commit ID MP4 Google Drive -
VideoCrafter-2.0 VBench Team 2024-01-18 320x512 10 16 1.6s link Commit ID MP4 Google Drive -
T2V-Turbo (VC2) T2V-Turbo Team 2024-05-29 320x512 16 16 1.0s link Commit ID MP4 - - unet_lora.pt is used to turn VideoCrafter-2.0 to T2V-Turbo (VC2)
AnimateDiff-V1 VBench Team 2023-07-18 512x512 8 16 2.0s T2I backbone SD1.5, Motion Module, LoRA(Realistic Vision 2.0) Commit ID MP4 Google Drive -
Negative PromptWe apply the same negative prompt during sampling for all videos: semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck
AnimateDiff-V2 VBench Team 2023-09-10 512x512 8 16 2.0s T2I backbone SD1.5, Motion Module, LoRA Commit ID MP4 Google Drive -
Negative PromptWe apply the same negative prompt during sampling for all videos: semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck
Latte-1 VBench Team 2024-05-23 512x512 8 16 2.0s link Commit ID MP4 Google Drive -
OpenSora V1.2 (2s) OpenSora Team 2024-06-28 854×480 24 51 2s link - MP4 link - eval results & info provided by OpenSora Team
HiGen VBench Team 2024-03-08 448x256 8 32 4.0s link Commit ID MP4 Google Drive -
TF-T2V VBench Team 2024-04-03 448x256 8 32 4.0s link Commit ID MP4 Google Drive -
AnimateLCM VBench Team 2024-02-26 512x512 8 16 2.0s link Commit ID MP4 Google Drive -
Negative PromptWe apply the same negative prompt during sampling for all videos: bad quality, worse quality, low resolution
InstructVideo(ModelScope) VBench Team 2024-06-17 256x256 8 16 2.0s link Commit ID MP4 Google Drive -
OpenSora V1.1 VBench Team 2024-04-25 424x240 8 64 8.0s link Commit ID MP4 Google Drive -
OpenSoraPlan V1.1 VBench Team 2024-05-27 512x512 24 221 9.2s link Commit ID MP4 Google Drive -
Mira VBench Team 2024-04-01 384x240 6 60 10.0s link Commit ID MP4 Google Drive -
Pika 1.0 VBench Team 2023-12-28 1280x720 24 72 3.0s - - MP4 Google Drive Google Drive
Gen-3 VBench Team 2024-06-17 1280x768 24 256 10.7s - - MP4 Google Drive Google Drive
Kling VBench Team 2024-06-06 1280x720 30 153 5.1s - - MP4 Google Drive - high-performance mode (lower sampling cost), not high-quality mode (better quality)
Data-Juicer (T2V-Turbo) Data-Juicer Team 2024-07-23 320x512 8 16 2.0s - - MP4 - - from Data-Juicer Team: based on T2V-Turbo, with Data-Juicer's data and loss enhancement
LaVie-2 LaVie-2 Team - 512x512 8 16 2.0s - - MP4 - - info provided by LaVie-2 Team
CogVideoX-2B (SAT, prompt-optimized) VBench Team 2024-08-06 720x480 8 49 6.1s link Commit ID MP4 Google Drive Google Drive applied augmented prompts
OpenSora V1.2 (8s) VBench Team 2024-06-17 1280x720 24 204 8.5s link Commit ID MP4 Google Drive Google Drive
CogVideoX-5B (SAT, prompt-optimized) VBench Team 2024-08-27 720x480 8 49 6.1s link Commit ID MP4 Google Drive - applied augmented prompts
Vchitect-2.0-2B VBench Team 2024-09-14 768x432 8 40 5.0s link Commit ID MP4 Google Drive - -
Vchitect-2.0 (VEnhancer) VBench Team 2024-09-14 1920x1080 16 79 4.9s - Commit ID MP4 Google Drive - -
JT-CV-9B JiuTianCV Team 2024-09-24 2158x1214 24 51 2.1s - - MP4 - - -
Data-Juicer (2024-09-23, T2V-Turbo) Data-Juicer Team 2024-09-23 512*320 8 16 2.0s link - MP4 - - from Data-Juicer Team: based on T2V-Turbo, with Data-Juicer's data and loss enhancement
MiniMax-Video-01 VBench Team 2024-10-01 1280x720 25 141 5.6s - - MP4 Google Drive - -
T2V-Turbo-v2 T2V-Turbo Team 2024-10-02 320x512 16 8 2.0s - - - - - -
OpenSoraPlan V1.2 VBench Team 2024-07-24 1280x720 24 93 3.9s link Commit ID MP4 Google Drive -
OpenSoraPlan V1.3 VBench Team 2024-10-16 640x352 18 93 5.2s link Commit ID MP4 Google Drive - Prompt refiner provided by OpenSoraPlanv1.3 is used. First, download the weights and set path in config. Then, use the original VBench prompts as input. The code will automatically process them and feed the refined prompts into the model.
Mochi-1 VBench Team 2024-10-22 848x480 30 163 5.4s link Commit ID MP4 Google Drive - Default settings from Mochi demo are used
CogVideoX1.5-5B (5s SAT prompt-optimized) VBench Team 2024-11-08 1360x768 16 84 5.3s link Commit ID MP4 Google Drive - applied augmented prompts
Vidu VBench Team 2024-07-30 688x384 16 124 7.8s - - MP4 Google Drive Google Drive -
TeleAI-VAST TeleAI 2024-12-02 480x720 5 25 5.0s - - MP4 - - info provided by TeleAI
HunyuanVideo (Open-Source Version) VBench Team 2024-12-03 1280x720 24 129 5.4s link Commit ID MP4 Google Drive - applied Prompt Rewrite provided by HunyuanVideo, prompt list
Jimeng VBench Team 2024-05-09 1280x720 8 96 12.0s - - MP4 Google Drive Google Drive -
LTX-video (5s 768×512) VBench Team 2024-11-22 768×512 25 121 4.8s link Commit ID MP4 Google Drive - applied augmented prompts
CausVid VBench Team 2024-12-07 640x352 12 120 10.0s - - MP4 Google Drive - -
STIV (Apple) VBench Team 2024-12-19 512x512 60 60 1.0s - - MP4 Google Drive - -
CausVid (2025-01-02 5s) VBench Team 2025-01-02 640x352 24 120 5.0s - - MP4 Google Drive - -
Wan2.1 VBench Team 2025-01-08 1280x720 16 80 5.0s - - MP4 Google Drive - -
Luma VBench Team 2024-06-13 1360x752 24 121 5.0s - - MP4 Google Drive Google Drive -
RepVideo VBench Team 2025-01-16 720x480 8 49 6.1s - Code MP4 Google Drive - -
MiracleVision V5 VBench Team 2025-01-21 720x480 24 120 5.0s - - MP4 Google Drive - -
Sora VBench Team 2025-01-14 854x480 30 150 5.0s - - MP4 Google Drive - -
EasyAnimateV5.1 VBench Team 2025-01-22 672x384 8 49 6.0s - Code MP4 Google Drive - -
Wan2.1(2025-02-24) VBench Team 2025-02-24 1280x720 16 80 5.0s - - MP4 Google Drive - -
IPOC VBench Team 2025-02-28 1360x768 16 81 5.0s - - MP4 Google Drive - -
CogVideoX-2B (Diffusers) VBench Team 2025-03-03 720x480 8 49 6.1s - - MP4 Google Drive - applied augmented prompts
CogVideoX-5B (Diffusers) VBench Team 2025-03-04 720x480 8 49 6.1s - - MP4 Google Drive - applied augmented prompts
Step-Video-T2V VBench Team 2025-03-13 992x544 25 200 8s - - MP4 Google Drive -
Open-Sora-2.0 VBench Team 2025-03-14 1024x576 24 120 5s - - MP4 Google Drive -
Wan2.1-T2V-1.3B VBench Team 2025-03-20 832x480 16 81 5s - - MP4 Google Drive - applied augmented prompts
Wan2.1-T2V-1.3B VBench Team 2025-03-20 832x480 16 81 5s - - MP4 Google Drive -
Open-Sora 2.0 (2025-03-18) VBench Team 2025-03-31 1024x576 24 120 5s - - MP4 Google Drive -
AccVideo VBench Team 2025-03-31 960x544 24 72 3s - - MP4 Google Drive -
IPOC (2025-04-14) VBench Team 2025-04-14 1360x768 16 81 5s - - MP4 Google Drive -
Vidu Q1 (2025-04-17) VBench Team 2025-04-21 1280x720 24 125 5.2s - - MP4 Google Drive -
CogVideoX1.5-5B VBench Team 2025-04-23 1360x768 16 161 10s - - MP4 Google Drive - applied augmented prompts
Wan2.1-T2V-1.3B (2025-05-03) VBench Team 2025-05-03 832x480 16 81 5s - - MP4 Google Drive - applied augmented prompts guidance_scale=6.0, flow_shift=3.0, num_inference_steps=50, sampler=unipc, negative prompt='色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走
Kling-1.6 VBench Team 2025-05-08 1280x720 24 216 9s - - MP4 Google Drive - applied augmented prompts
Hunyuan Video (2025-05-22) VBench Team 2025-05-22 1280x720 24 129 5.4s - - MP4 Google Drive - applied Prompt Rewrite provided by HunyuanVideo, prompt list
MAGI-T2V-4.5B-distill VBench Team 2025-06-11 720x720 24 96 4s - - MP4 Google Drive -
Wan2.1-T2V-14B VBench Team 2025-07-25 1280x720 16 81 5s - - MP4 Google Drive - applied augmented prompts
JT-CV-9B VBench Team 2025-07-30 2158 × 1214 24 51 2.1s - - MP4 Google Drive - applied augmented prompts
LanDiff VBench Team 2025-08-06 720x480 8 49 6s - - MP4 Google Drive -
IPOW VBench Team 2025-08-06 832x480 16 81 5s - - MP4 Google Drive -
Veo 3 VBench Team 2025-08-06 1280×720 24 192 8s - - MP4 Google Drive -
MAGI-T2V-24B-distill VBench Team 2025-08-06 1280×720 24 96 4s - - MP4 Google Drive -
LTX-2 (Diffusers) (w/o prompt extend) VBench Team 2026-02-04 768×512 24 121 5.0s LTX-2 Commit ID MP4 Google Drive - Videos sampled using TI2VidTwoStagesPipeline and ltx-2-19b-dev.safetensors checkpoint
Wan2.2-T2V-A14B (w/o prompt extend) VBench Team 2026-02-18 1280x720 16 81 5.1s Wan2.2-T2V-A14B Commit ID MP4 Google Drive -
Wan2.2-T2V-A14B (Qwen prompt extend) VBench Team 2026-02-18 1280x720 16 81 5.1s Wan2.2-T2V-A14B Commit ID MP4 Google Drive -

How are Files Structured in Google Drive?

1. Sub-Folder Organization

For these models,

  • (1) The per_dimension zip contains 11 subfolders corresponding to videos sampled for evaluating different dimensions.
  • (1) The per_category zip contains 8 subfolders corresponding to videos sampled for evaluating different content categories.

1.1. Single-Stage Outputs

For LaVie, ModelScope, CogVideo, VideoCrafter-0.9, Open-Sora, VideoCrafter-2.0, AnimateDiff-V2, we provide their single-stage outputs.

We take LaVie as an example:

- per_dimension
    - lavie
        - appearance_style   
            - The bund Shanghai, Van Gogh style-0.mp4
            - The bund Shanghai, Van Gogh style-1.mp4
            - ...
        - human_action
            - A person is finger snapping-0.mp4
            - A person is finger snapping-1.mp4
            - ...
        - object_class
            - a dining table-0.mp4
            - a dining table-1.mp4
            - ...
        - scene
            - restaurant-0.mp4
            - restaurant-1.mp4
            - ...
        - subject_consistency
            - a giraffe taking a peaceful walk-0.mp4
            - a giraffe taking a peaceful walk-1.mp4
            - ...
        - temporal_style
            - The bund Shanghai, zoom in-0.mp4
            - The bund Shanghai, zoom in-1.mp4
            - ...
        - color
            - a blue clock-0.mp4
            - a blue clock-1.mp4
            - ...
        - multiple_objects
            - a fire hydrant and a stop sign-0.mp4
            - a fire hydrant and a stop sign-1.mp4
            - ...
        - overall_consistency
            - Yellow flowers swing in the wind-0.mp4
            - Yellow flowers swing in the wind-1.mp4
            - ...
        - spatial_relationship
            - a frisbee on the left of a sports ball, front view-0.mp4
            - a frisbee on the left of a sports ball, front view-1.mp4
            - ...
        - temporal_flickering
            - static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-0.mp4
            - static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-1.mp4
            - ...
- per_category
    - lavie # or modelscope, cogvideo, videocrafter-0.9
        - animal  
            - wild rabbit in a green meadow-0.mp4
            - wild rabbit in a green meadow-1.mp4
            - ...
        - architecture
            - water tower on the desert-0.mp4
            - water tower on the desert-1.mp4
            - ...
        - food
            - waffles with whipped cream and fruit-0.mp4
            - waffles with whipped cream and fruit-1.mp4
            - ...
        - human
            - young dancer practicing at home-0.mp4
            - young dancer practicing at home-1.mp4
            - ...
        - lifestyle
            - the interior design of a shopping mall-0.mp4
            - the interior design of a shopping mall-1.mp4
            - ...
        - plant
            - coconut tree near sea under blue sky-0.mp4
            - coconut tree near sea under blue sky-1.mp4
            - ...
        - scenery
            - waterfalls in between mountain-0.mp4
            - waterfalls in between mountain-1.mp4
            - ...
        - vehicles
            - video of yacht sailing in the ocean-0.mp4
            - video of yacht sailing in the ocean-1.mp4
            - ...

1.2. Multi-Stage Outputs (Show-1)

For show-1, there are two folders corresponding to the last two stages of show-1 generated videos, namely super1 and super2. The leaderboard results correspond to evaluation on the final stage, namely super2.

- per_dimension
    - show-1
        - appearance_style/{super1/super2}       # subfolder super1 or super2
            - The bund Shanghai, Van Gogh style-0.mp4
            - The bund Shanghai, Van Gogh style-1.mp4
            - ...
        - human_action/{super1/super2}
            - A person is finger snapping-0.mp4
            - A person is finger snapping-1.mp4
            - ...
        - object_class/{super1/super2}
            - a dining table-0.mp4
            - a dining table-1.mp4
            - ...
        - scene/{super1/super2}
            - restaurant-0.mp4
            - restaurant-1.mp4
            - ...
        - subject_consistency/{super1/super2}
            - a giraffe taking a peaceful walk-0.mp4
            - a giraffe taking a peaceful walk-1.mp4
            - ...
        - temporal_style/{super1/super2}
            - The bund Shanghai, zoom in-0.mp4
            - The bund Shanghai, zoom in-1.mp4
            - ...
        - color/{super1/super2}
            - a blue clock-0.mp4
            - a blue clock-1.mp4
            - ...
        - multiple_objects/{super1/super2}
            - a fire hydrant and a stop sign-0.mp4
            - a fire hydrant and a stop sign-1.mp4
            - ...
        - overall_consistency/{super1/super2}
            - Yellow flowers swing in the wind-0.mp4
            - Yellow flowers swing in the wind-1.mp4
            - ...
        - spatial_relationship/{super1/super2}
            - a frisbee on the left of a sports ball, front view-0.mp4
            - a frisbee on the left of a sports ball, front view-1.mp4
            - ...
        - temporal_flickering/{super1/super2}
            - static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-0.mp4
            - static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-1.mp4
            - ...
- per_category
    - show-1
        - animal/{super1/super2}
            - wild rabbit in a green meadow-0.mp4
            - wild rabbit in a green meadow-1.mp4
            - ...
        - architecture/{super1/super2}
            - water tower on the desert-0.mp4
            - water tower on the desert-1.mp4
            - ...
        - food/{super1/super2}
            - waffles with whipped cream and fruit-0.mp4
            - waffles with whipped cream and fruit-1.mp4
            - ...
        - human/{super1/super2}
            - young dancer practicing at home-0.mp4
            - young dancer practicing at home-1.mp4
            - ...
        - lifestyle/{super1/super2}
            - the interior design of a shopping mall-0.mp4
            - the interior design of a shopping mall-1.mp4
            - ...
        - plant/{super1/super2}
            - coconut tree near sea under blue sky-0.mp4
            - coconut tree near sea under blue sky-1.mp4
            - ...
        - scenery/{super1/super2}
            - waterfalls in between mountain-0.mp4
            - waterfalls in between mountain-1.mp4
            - ...
        - vehicles/{super1/super2}
            - video of yacht sailing in the ocean-0.mp4
            - video of yacht sailing in the ocean-1.mp4
            - ...

1.3. Multi-Resolution Outputs (VideoCrafter-1)

Under each dimension or category in videocrafter-1, there are two folders corresponding to the two resolution options for videocrafter-1 generated videos, namely 1024x576 and 512x320. The leaderboard currently contains the evaluation results for the 1024x576 resolution.

- per_dimension
    - videocrafter-1
        - appearance_style/{1024x576/512x320}       # subfolder 1024x576 or 512x320
            - The bund Shanghai, Van Gogh style-0.mp4
            - The bund Shanghai, Van Gogh style-1.mp4
            - ...
        - human_action/{1024x576/512x320}
            - A person is finger snapping-0.mp4
            - A person is finger snapping-1.mp4
            - ...
        - object_class/{1024x576/512x320}
            - a dining table-0.mp4
            - a dining table-1.mp4
            - ...
        - scene/{1024x576/512x320}
            - restaurant-0.mp4
            - restaurant-1.mp4
            - ...
        - subject_consistency/{1024x576/512x320}
            - a giraffe taking a peaceful walk-0.mp4
            - a giraffe taking a peaceful walk-1.mp4
            - ...
        - temporal_style/{1024x576/512x320}
            - The bund Shanghai, zoom in-0.mp4
            - The bund Shanghai, zoom in-1.mp4
            - ...
        - color/{1024x576/512x320}
            - a blue clock-0.mp4
            - a blue clock-1.mp4
            - ...
        - multiple_objects/{1024x576/512x320}
            - a fire hydrant and a stop sign-0.mp4
            - a fire hydrant and a stop sign-1.mp4
            - ...
        - overall_consistency/{1024x576/512x320}
            - Yellow flowers swing in the wind-0.mp4
            - Yellow flowers swing in the wind-1.mp4
            - ...
        - spatial_relationship/{1024x576/512x320}
            - a frisbee on the left of a sports ball, front view-0.mp4
            - a frisbee on the left of a sports ball, front view-1.mp4
            - ...
        - temporal_flickering/{1024x576/512x320}
            - static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-0.mp4
            - static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-1.mp4
            - ...
- per_category
    - videocrafter-1
        - animal/{1024x576/512x320}
            - wild rabbit in a green meadow-0.mp4
            - wild rabbit in a green meadow-1.mp4
            - ...
        - architecture/{1024x576/512x320}
            - water tower on the desert-0.mp4
            - water tower on the desert-1.mp4
            - ...
        - food/{1024x576/512x320}
            - waffles with whipped cream and fruit-0.mp4
            - waffles with whipped cream and fruit-1.mp4
            - ...
        - human/{1024x576/512x320}
            - young dancer practicing at home-0.mp4
            - young dancer practicing at home-1.mp4
            - ...
        - lifestyle/{1024x576/512x320}
            - the interior design of a shopping mall-0.mp4
            - the interior design of a shopping mall-1.mp4
            - ...
        - plant/{1024x576/512x320}
            - coconut tree near sea under blue sky-0.mp4
            - coconut tree near sea under blue sky-1.mp4
            - ...
        - scenery/{1024x576/512x320}
            - waterfalls in between mountain-0.mp4
            - waterfalls in between mountain-1.mp4
            - ...
        - vehicles/{1024x576/512x320}
            - video of yacht sailing in the ocean-0.mp4
            - video of yacht sailing in the ocean-1.mp4
            - ...

2. Single-Folder Organization (Gen-2, Pika)

Gen-2 and Pika also include videos for "all_dimension" and "all_category", but we haven't divide the videos into subfolders according to specific dimensions or categories yet.

- per_dimension
    - gen-2
        - all_dimension
            - Yellow flowers swing in the wind-0.mp4
            - Yellow flowers swing in the wind-1.mp4
            - ...
    - pika
        - all_dimension
            - Yellow flowers swing in the wind-0.mp4
            - Yellow flowers swing in the wind-1.mp4
            - ...
- per_category
    - gen-2
        - all_category
            - young people celebrating new year at the office-0.mp4
            - young people celebrating new year at the office-1.mp4
            - ...
    - pika
        - all_category
            - young people celebrating new year at the office-0.mp4
            - young people celebrating new year at the office-1.mp4
            - ...

Human Preference Labels

Available for download at Google Drive.

Each dimension contains an annotation file, each of which contains a list, and the list contains manually preferred annotation results of videos generated by different prompts. The evaluation process involves comparing videos from different models and, based on human annotations, determining which video best matches the prompt for the corresponding dimension.

Data Structure

JSON data is composed of multiple objects, each representing an evaluation instance. Each instance contains the following key-value pairs:

prompt_en: The text prompt for generating the desired video content.

style_en/color_en/object_en ..: Dimension-related information.

question_en: The question asked to the human annotators / VLM.

videos: This section contains the urls to videos from different models.

human_anno: This section represents human annotation, which is composed of a nested dictionary. The outer keys represent the model names (e.g., "modelscope", "lavie"), and the inner keys represent the other model names. The corresponding values within these nested dictionaries represent the human-assigned scores for the relative quality of each model's video compared to the other model's video.

For example, human_anno["modelscope"]["lavie"] = 0 indicates that humans judged the Lavie video to be better than the Modelscope video for the given prompt and style.

human_anno["modelscope"]["videocraft"] = 1 indicates that humans judged the Modelscope video to be better than the Videocraft video.

human_anno["cogvideo"]["videocraft"] = 0.5 indicates that humans judged the Cogvideo video and the Videocraft video to be of equal quality.