To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for VBench evaluation. You can download them on Google Drive.
- 8 T2V Models:
- including lavie, modelscope, cogvideo, videocrafter-0.9, videocrafter-1, show-1, pika, gen-2. More details of models are provided below.
- 2 Suites of Videos for each Model:
- Per Dimension: The sampled videos for each ability dimension evaluated by VBench. The per-dimension prompts are available under
prompts/prompts_per_dimension, and we also provide a combined list of all the dimensions' prompts atprompts/all_dimension.txt. - Per Category: The sampled videos for each ability dimension evaluated by VBench. The per-dimension prompts are available under
prompts/prompts_per_category, and we also provide a combined list of all the dimensions' prompts atprompts/all_category.txt.
- Per Dimension: The sampled videos for each ability dimension evaluated by VBench. The per-dimension prompts are available under
What's the potential usage of these videos:
- Further labeling on video quality
- For Instruction Tuning, using our videos and our human preference labels
Below is the folder structure of different models' sampled videos:
t2v_sampled_videos
├── per_dimension
│ ├── cogvideo.zip
│ ├── gen-2-all-dimension.tar.gz
│ ├── lavie.zip
│ ├── modelscope.zip
│ ├── opensora.tar
│ ├── pika-all-dimension.zip
│ ├── show-1.tar.gz
│ ├── videocrafter-1.tar.gz
│ ├── videocrafter-2.tar
│ └── videocrafter-09.zip
└── per_category
├── cogvideo.zip
├── gen-2-all-category.tar.gz
├── lavie.zip
├── modelscope.zip
├── pika-all-category.zip
├── show-1.tar.gz
├── videocrafter-0.9.zip
└── videocrafter-1.zip
You can utilize gdown to download from Google Drive. Below is an example:
- First, install
gdown:
pip install gdown
- Then, download zip file using
gdown:
gdown --id <file_id> --output <output_filename>
# Example for videocrafter-1
gdown --id 1FCRj48-Yv7LM7XGgfDCvIo7Kb9EId5KX --output videocrafter-1.tar.gz
We list the setting for sampling videos from these models.
| Model | Evaluation Party | Release Time | Resolution | FPS | Frame Count | Video Length | Checkpoint | Code Commit ID | Video Format | Sampled Videos (Dimension) | Sampled Videos (Category) | Other Settings |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
LaVie |
VBench Team | 2023-09-26 | 512x512 | 8 | 16 | 2.0s | - | - | MP4 | Google Drive | Google Drive | |
LaVie-Interpolation |
VBench Team | 2023-09-26 | 512x512 | 24 | 61 | 2.5s | link | - | MP4 | Google Drive | - | |
ModelScope |
VBench Team | 2023-08-12 | 256x256 | 8 | 16 | 2.0s | link | - | MP4 | Google Drive | Google Drive | |
CogVideo |
VBench Team | 2022-05-29 | 480x480 | 10 | 33 | 3.3s | link | - | GIF | Google Drive | Google Drive | |
VideoCrafter-0.9 |
VBench Team | 2023-04-05 | 256x256 | 8 | 16 | 2.0s | link | Commit ID | MP4 | Google Drive | Google Drive | |
VideoCrafter-1.0 |
VBench Team | 2023-10-30 | 1024x576 | 10 | 16 | 1.6s | link | Commit ID | MP4 | Google Drive | Google Drive | |
Show-1 |
VBench Team | 2023-09-27 | 576x320 | 8 | 29 | 3.6s | link | Commit ID | MP4 | Google Drive | Google Drive | |
Gen-2 |
VBench Team | 2023-06-07 | 1408x768 | 24 | 96 | 4.0s | - | - | MP4 | Google Drive | Google Drive | |
Pika |
VBench Team | 2023-06-29 | 1088x640 | 24 | 72 | 3.0s | - | - | MP4 | Google Drive | Google Drive | |
Open-Sora |
VBench Team | 2024-03-18 | 512x512 | 8 | 16 | 2.0s | link | Commit ID | MP4 | Google Drive | - | |
VideoCrafter-2.0 |
VBench Team | 2024-01-18 | 320x512 | 10 | 16 | 1.6s | link | Commit ID | MP4 | Google Drive | - | |
T2V-Turbo (VC2) |
T2V-Turbo Team | 2024-05-29 | 320x512 | 16 | 16 | 1.0s | link | Commit ID | MP4 | - | - | unet_lora.pt is used to turn VideoCrafter-2.0 to T2V-Turbo (VC2) |
AnimateDiff-V1 |
VBench Team | 2023-07-18 | 512x512 | 8 | 16 | 2.0s | T2I backbone SD1.5, Motion Module, LoRA(Realistic Vision 2.0) | Commit ID | MP4 | Google Drive | - | Negative PromptWe apply the same negative prompt during sampling for all videos:semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck |
AnimateDiff-V2 |
VBench Team | 2023-09-10 | 512x512 | 8 | 16 | 2.0s | T2I backbone SD1.5, Motion Module, LoRA | Commit ID | MP4 | Google Drive | - | Negative PromptWe apply the same negative prompt during sampling for all videos:semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck |
Latte-1 |
VBench Team | 2024-05-23 | 512x512 | 8 | 16 | 2.0s | link | Commit ID | MP4 | Google Drive | - | |
OpenSora V1.2 (2s) |
OpenSora Team | 2024-06-28 | 854×480 | 24 | 51 | 2s | link | - | MP4 | link | - | eval results & info provided by OpenSora Team |
HiGen |
VBench Team | 2024-03-08 | 448x256 | 8 | 32 | 4.0s | link | Commit ID | MP4 | Google Drive | - | |
TF-T2V |
VBench Team | 2024-04-03 | 448x256 | 8 | 32 | 4.0s | link | Commit ID | MP4 | Google Drive | - | |
AnimateLCM |
VBench Team | 2024-02-26 | 512x512 | 8 | 16 | 2.0s | link | Commit ID | MP4 | Google Drive | - | Negative PromptWe apply the same negative prompt during sampling for all videos:bad quality, worse quality, low resolution |
InstructVideo(ModelScope) |
VBench Team | 2024-06-17 | 256x256 | 8 | 16 | 2.0s | link | Commit ID | MP4 | Google Drive | - | |
OpenSora V1.1 |
VBench Team | 2024-04-25 | 424x240 | 8 | 64 | 8.0s | link | Commit ID | MP4 | Google Drive | - | |
OpenSoraPlan V1.1 |
VBench Team | 2024-05-27 | 512x512 | 24 | 221 | 9.2s | link | Commit ID | MP4 | Google Drive | - | |
Mira |
VBench Team | 2024-04-01 | 384x240 | 6 | 60 | 10.0s | link | Commit ID | MP4 | Google Drive | - | |
Pika 1.0 |
VBench Team | 2023-12-28 | 1280x720 | 24 | 72 | 3.0s | - | - | MP4 | Google Drive | Google Drive | |
Gen-3 |
VBench Team | 2024-06-17 | 1280x768 | 24 | 256 | 10.7s | - | - | MP4 | Google Drive | Google Drive | |
Kling |
VBench Team | 2024-06-06 | 1280x720 | 30 | 153 | 5.1s | - | - | MP4 | Google Drive | - | high-performance mode (lower sampling cost), not high-quality mode (better quality) |
Data-Juicer (T2V-Turbo) |
Data-Juicer Team | 2024-07-23 | 320x512 | 8 | 16 | 2.0s | - | - | MP4 | - | - | from Data-Juicer Team: based on T2V-Turbo, with Data-Juicer's data and loss enhancement |
LaVie-2 |
LaVie-2 Team | - | 512x512 | 8 | 16 | 2.0s | - | - | MP4 | - | - | info provided by LaVie-2 Team |
CogVideoX-2B (SAT, prompt-optimized) |
VBench Team | 2024-08-06 | 720x480 | 8 | 49 | 6.1s | link | Commit ID | MP4 | Google Drive | Google Drive | applied augmented prompts |
OpenSora V1.2 (8s) |
VBench Team | 2024-06-17 | 1280x720 | 24 | 204 | 8.5s | link | Commit ID | MP4 | Google Drive | Google Drive | |
CogVideoX-5B (SAT, prompt-optimized) |
VBench Team | 2024-08-27 | 720x480 | 8 | 49 | 6.1s | link | Commit ID | MP4 | Google Drive | - | applied augmented prompts |
Vchitect-2.0-2B |
VBench Team | 2024-09-14 | 768x432 | 8 | 40 | 5.0s | link | Commit ID | MP4 | Google Drive | - | - |
Vchitect-2.0 (VEnhancer) |
VBench Team | 2024-09-14 | 1920x1080 | 16 | 79 | 4.9s | - | Commit ID | MP4 | Google Drive | - | - |
JT-CV-9B |
JiuTianCV Team | 2024-09-24 | 2158x1214 | 24 | 51 | 2.1s | - | - | MP4 | - | - | - |
Data-Juicer (2024-09-23, T2V-Turbo) |
Data-Juicer Team | 2024-09-23 | 512*320 | 8 | 16 | 2.0s | link | - | MP4 | - | - | from Data-Juicer Team: based on T2V-Turbo, with Data-Juicer's data and loss enhancement |
MiniMax-Video-01 |
VBench Team | 2024-10-01 | 1280x720 | 25 | 141 | 5.6s | - | - | MP4 | Google Drive | - | - |
T2V-Turbo-v2 |
T2V-Turbo Team | 2024-10-02 | 320x512 | 16 | 8 | 2.0s | - | - | - | - | - | - |
OpenSoraPlan V1.2 |
VBench Team | 2024-07-24 | 1280x720 | 24 | 93 | 3.9s | link | Commit ID | MP4 | Google Drive | - | |
OpenSoraPlan V1.3 |
VBench Team | 2024-10-16 | 640x352 | 18 | 93 | 5.2s | link | Commit ID | MP4 | Google Drive | - | Prompt refiner provided by OpenSoraPlanv1.3 is used. First, download the weights and set path in config. Then, use the original VBench prompts as input. The code will automatically process them and feed the refined prompts into the model. |
Mochi-1 |
VBench Team | 2024-10-22 | 848x480 | 30 | 163 | 5.4s | link | Commit ID | MP4 | Google Drive | - | Default settings from Mochi demo are used |
CogVideoX1.5-5B (5s SAT prompt-optimized) |
VBench Team | 2024-11-08 | 1360x768 | 16 | 84 | 5.3s | link | Commit ID | MP4 | Google Drive | - | applied augmented prompts |
Vidu |
VBench Team | 2024-07-30 | 688x384 | 16 | 124 | 7.8s | - | - | MP4 | Google Drive | Google Drive | - |
TeleAI-VAST |
TeleAI | 2024-12-02 | 480x720 | 5 | 25 | 5.0s | - | - | MP4 | - | - | info provided by TeleAI |
HunyuanVideo (Open-Source Version) |
VBench Team | 2024-12-03 | 1280x720 | 24 | 129 | 5.4s | link | Commit ID | MP4 | Google Drive | - | applied Prompt Rewrite provided by HunyuanVideo, prompt list |
Jimeng |
VBench Team | 2024-05-09 | 1280x720 | 8 | 96 | 12.0s | - | - | MP4 | Google Drive | Google Drive | - |
LTX-video (5s 768×512) |
VBench Team | 2024-11-22 | 768×512 | 25 | 121 | 4.8s | link | Commit ID | MP4 | Google Drive | - | applied augmented prompts |
CausVid |
VBench Team | 2024-12-07 | 640x352 | 12 | 120 | 10.0s | - | - | MP4 | Google Drive | - | - |
STIV (Apple) |
VBench Team | 2024-12-19 | 512x512 | 60 | 60 | 1.0s | - | - | MP4 | Google Drive | - | - |
CausVid (2025-01-02 5s) |
VBench Team | 2025-01-02 | 640x352 | 24 | 120 | 5.0s | - | - | MP4 | Google Drive | - | - |
Wan2.1 |
VBench Team | 2025-01-08 | 1280x720 | 16 | 80 | 5.0s | - | - | MP4 | Google Drive | - | - |
Luma |
VBench Team | 2024-06-13 | 1360x752 | 24 | 121 | 5.0s | - | - | MP4 | Google Drive | Google Drive | - |
RepVideo |
VBench Team | 2025-01-16 | 720x480 | 8 | 49 | 6.1s | - | Code | MP4 | Google Drive | - | - |
MiracleVision V5 |
VBench Team | 2025-01-21 | 720x480 | 24 | 120 | 5.0s | - | - | MP4 | Google Drive | - | - |
Sora |
VBench Team | 2025-01-14 | 854x480 | 30 | 150 | 5.0s | - | - | MP4 | Google Drive | - | - |
EasyAnimateV5.1 |
VBench Team | 2025-01-22 | 672x384 | 8 | 49 | 6.0s | - | Code | MP4 | Google Drive | - | - |
Wan2.1(2025-02-24) |
VBench Team | 2025-02-24 | 1280x720 | 16 | 80 | 5.0s | - | - | MP4 | Google Drive | - | - |
IPOC |
VBench Team | 2025-02-28 | 1360x768 | 16 | 81 | 5.0s | - | - | MP4 | Google Drive | - | - |
CogVideoX-2B (Diffusers) |
VBench Team | 2025-03-03 | 720x480 | 8 | 49 | 6.1s | - | - | MP4 | Google Drive | - | applied augmented prompts |
CogVideoX-5B (Diffusers) |
VBench Team | 2025-03-04 | 720x480 | 8 | 49 | 6.1s | - | - | MP4 | Google Drive | - | applied augmented prompts |
Step-Video-T2V |
VBench Team | 2025-03-13 | 992x544 | 25 | 200 | 8s | - | - | MP4 | Google Drive | - | |
Open-Sora-2.0 |
VBench Team | 2025-03-14 | 1024x576 | 24 | 120 | 5s | - | - | MP4 | Google Drive | - | |
Wan2.1-T2V-1.3B |
VBench Team | 2025-03-20 | 832x480 | 16 | 81 | 5s | - | - | MP4 | Google Drive | - | applied augmented prompts |
Wan2.1-T2V-1.3B |
VBench Team | 2025-03-20 | 832x480 | 16 | 81 | 5s | - | - | MP4 | Google Drive | - | |
Open-Sora 2.0 (2025-03-18) |
VBench Team | 2025-03-31 | 1024x576 | 24 | 120 | 5s | - | - | MP4 | Google Drive | - | |
AccVideo |
VBench Team | 2025-03-31 | 960x544 | 24 | 72 | 3s | - | - | MP4 | Google Drive | - | |
IPOC (2025-04-14) |
VBench Team | 2025-04-14 | 1360x768 | 16 | 81 | 5s | - | - | MP4 | Google Drive | - | |
Vidu Q1 (2025-04-17) |
VBench Team | 2025-04-21 | 1280x720 | 24 | 125 | 5.2s | - | - | MP4 | Google Drive | - | |
CogVideoX1.5-5B |
VBench Team | 2025-04-23 | 1360x768 | 16 | 161 | 10s | - | - | MP4 | Google Drive | - | applied augmented prompts |
Wan2.1-T2V-1.3B (2025-05-03) |
VBench Team | 2025-05-03 | 832x480 | 16 | 81 | 5s | - | - | MP4 | Google Drive | - | applied augmented prompts guidance_scale=6.0, flow_shift=3.0, num_inference_steps=50, sampler=unipc, negative prompt='色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走 |
Kling-1.6 |
VBench Team | 2025-05-08 | 1280x720 | 24 | 216 | 9s | - | - | MP4 | Google Drive | - | applied augmented prompts |
Hunyuan Video (2025-05-22) |
VBench Team | 2025-05-22 | 1280x720 | 24 | 129 | 5.4s | - | - | MP4 | Google Drive | - | applied Prompt Rewrite provided by HunyuanVideo, prompt list |
MAGI-T2V-4.5B-distill |
VBench Team | 2025-06-11 | 720x720 | 24 | 96 | 4s | - | - | MP4 | Google Drive | - | |
Wan2.1-T2V-14B |
VBench Team | 2025-07-25 | 1280x720 | 16 | 81 | 5s | - | - | MP4 | Google Drive | - | applied augmented prompts |
JT-CV-9B |
VBench Team | 2025-07-30 | 2158 × 1214 | 24 | 51 | 2.1s | - | - | MP4 | Google Drive | - | applied augmented prompts |
LanDiff |
VBench Team | 2025-08-06 | 720x480 | 8 | 49 | 6s | - | - | MP4 | Google Drive | - | |
IPOW |
VBench Team | 2025-08-06 | 832x480 | 16 | 81 | 5s | - | - | MP4 | Google Drive | - | |
Veo 3 |
VBench Team | 2025-08-06 | 1280×720 | 24 | 192 | 8s | - | - | MP4 | Google Drive | - | |
MAGI-T2V-24B-distill |
VBench Team | 2025-08-06 | 1280×720 | 24 | 96 | 4s | - | - | MP4 | Google Drive | - | |
LTX-2 (Diffusers) (w/o prompt extend) |
VBench Team | 2026-02-04 | 768×512 | 24 | 121 | 5.0s | LTX-2 | Commit ID | MP4 | Google Drive | - | Videos sampled using TI2VidTwoStagesPipeline and ltx-2-19b-dev.safetensors checkpoint |
Wan2.2-T2V-A14B (w/o prompt extend) |
VBench Team | 2026-02-18 | 1280x720 | 16 | 81 | 5.1s | Wan2.2-T2V-A14B | Commit ID | MP4 | Google Drive | - | |
Wan2.2-T2V-A14B (Qwen prompt extend) |
VBench Team | 2026-02-18 | 1280x720 | 16 | 81 | 5.1s | Wan2.2-T2V-A14B | Commit ID | MP4 | Google Drive | - |
For these models,
- (1) The
per_dimensionzip contains 11 subfolders corresponding to videos sampled for evaluating different dimensions. - (1) The
per_categoryzip contains 8 subfolders corresponding to videos sampled for evaluating different content categories.
For LaVie, ModelScope, CogVideo, VideoCrafter-0.9, Open-Sora, VideoCrafter-2.0, AnimateDiff-V2, we provide their single-stage outputs.
We take LaVie as an example:
- per_dimension
- lavie
- appearance_style
- The bund Shanghai, Van Gogh style-0.mp4
- The bund Shanghai, Van Gogh style-1.mp4
- ...
- human_action
- A person is finger snapping-0.mp4
- A person is finger snapping-1.mp4
- ...
- object_class
- a dining table-0.mp4
- a dining table-1.mp4
- ...
- scene
- restaurant-0.mp4
- restaurant-1.mp4
- ...
- subject_consistency
- a giraffe taking a peaceful walk-0.mp4
- a giraffe taking a peaceful walk-1.mp4
- ...
- temporal_style
- The bund Shanghai, zoom in-0.mp4
- The bund Shanghai, zoom in-1.mp4
- ...
- color
- a blue clock-0.mp4
- a blue clock-1.mp4
- ...
- multiple_objects
- a fire hydrant and a stop sign-0.mp4
- a fire hydrant and a stop sign-1.mp4
- ...
- overall_consistency
- Yellow flowers swing in the wind-0.mp4
- Yellow flowers swing in the wind-1.mp4
- ...
- spatial_relationship
- a frisbee on the left of a sports ball, front view-0.mp4
- a frisbee on the left of a sports ball, front view-1.mp4
- ...
- temporal_flickering
- static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-0.mp4
- static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-1.mp4
- ...
- per_category
- lavie # or modelscope, cogvideo, videocrafter-0.9
- animal
- wild rabbit in a green meadow-0.mp4
- wild rabbit in a green meadow-1.mp4
- ...
- architecture
- water tower on the desert-0.mp4
- water tower on the desert-1.mp4
- ...
- food
- waffles with whipped cream and fruit-0.mp4
- waffles with whipped cream and fruit-1.mp4
- ...
- human
- young dancer practicing at home-0.mp4
- young dancer practicing at home-1.mp4
- ...
- lifestyle
- the interior design of a shopping mall-0.mp4
- the interior design of a shopping mall-1.mp4
- ...
- plant
- coconut tree near sea under blue sky-0.mp4
- coconut tree near sea under blue sky-1.mp4
- ...
- scenery
- waterfalls in between mountain-0.mp4
- waterfalls in between mountain-1.mp4
- ...
- vehicles
- video of yacht sailing in the ocean-0.mp4
- video of yacht sailing in the ocean-1.mp4
- ...
For show-1, there are two folders corresponding to the last two stages of show-1 generated videos, namely super1 and super2. The leaderboard results correspond to evaluation on the final stage, namely super2.
- per_dimension
- show-1
- appearance_style/{super1/super2} # subfolder super1 or super2
- The bund Shanghai, Van Gogh style-0.mp4
- The bund Shanghai, Van Gogh style-1.mp4
- ...
- human_action/{super1/super2}
- A person is finger snapping-0.mp4
- A person is finger snapping-1.mp4
- ...
- object_class/{super1/super2}
- a dining table-0.mp4
- a dining table-1.mp4
- ...
- scene/{super1/super2}
- restaurant-0.mp4
- restaurant-1.mp4
- ...
- subject_consistency/{super1/super2}
- a giraffe taking a peaceful walk-0.mp4
- a giraffe taking a peaceful walk-1.mp4
- ...
- temporal_style/{super1/super2}
- The bund Shanghai, zoom in-0.mp4
- The bund Shanghai, zoom in-1.mp4
- ...
- color/{super1/super2}
- a blue clock-0.mp4
- a blue clock-1.mp4
- ...
- multiple_objects/{super1/super2}
- a fire hydrant and a stop sign-0.mp4
- a fire hydrant and a stop sign-1.mp4
- ...
- overall_consistency/{super1/super2}
- Yellow flowers swing in the wind-0.mp4
- Yellow flowers swing in the wind-1.mp4
- ...
- spatial_relationship/{super1/super2}
- a frisbee on the left of a sports ball, front view-0.mp4
- a frisbee on the left of a sports ball, front view-1.mp4
- ...
- temporal_flickering/{super1/super2}
- static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-0.mp4
- static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-1.mp4
- ...
- per_category
- show-1
- animal/{super1/super2}
- wild rabbit in a green meadow-0.mp4
- wild rabbit in a green meadow-1.mp4
- ...
- architecture/{super1/super2}
- water tower on the desert-0.mp4
- water tower on the desert-1.mp4
- ...
- food/{super1/super2}
- waffles with whipped cream and fruit-0.mp4
- waffles with whipped cream and fruit-1.mp4
- ...
- human/{super1/super2}
- young dancer practicing at home-0.mp4
- young dancer practicing at home-1.mp4
- ...
- lifestyle/{super1/super2}
- the interior design of a shopping mall-0.mp4
- the interior design of a shopping mall-1.mp4
- ...
- plant/{super1/super2}
- coconut tree near sea under blue sky-0.mp4
- coconut tree near sea under blue sky-1.mp4
- ...
- scenery/{super1/super2}
- waterfalls in between mountain-0.mp4
- waterfalls in between mountain-1.mp4
- ...
- vehicles/{super1/super2}
- video of yacht sailing in the ocean-0.mp4
- video of yacht sailing in the ocean-1.mp4
- ...
Under each dimension or category in videocrafter-1, there are two folders corresponding to the two resolution options for videocrafter-1 generated videos, namely 1024x576 and 512x320. The leaderboard currently contains the evaluation results for the 1024x576 resolution.
- per_dimension
- videocrafter-1
- appearance_style/{1024x576/512x320} # subfolder 1024x576 or 512x320
- The bund Shanghai, Van Gogh style-0.mp4
- The bund Shanghai, Van Gogh style-1.mp4
- ...
- human_action/{1024x576/512x320}
- A person is finger snapping-0.mp4
- A person is finger snapping-1.mp4
- ...
- object_class/{1024x576/512x320}
- a dining table-0.mp4
- a dining table-1.mp4
- ...
- scene/{1024x576/512x320}
- restaurant-0.mp4
- restaurant-1.mp4
- ...
- subject_consistency/{1024x576/512x320}
- a giraffe taking a peaceful walk-0.mp4
- a giraffe taking a peaceful walk-1.mp4
- ...
- temporal_style/{1024x576/512x320}
- The bund Shanghai, zoom in-0.mp4
- The bund Shanghai, zoom in-1.mp4
- ...
- color/{1024x576/512x320}
- a blue clock-0.mp4
- a blue clock-1.mp4
- ...
- multiple_objects/{1024x576/512x320}
- a fire hydrant and a stop sign-0.mp4
- a fire hydrant and a stop sign-1.mp4
- ...
- overall_consistency/{1024x576/512x320}
- Yellow flowers swing in the wind-0.mp4
- Yellow flowers swing in the wind-1.mp4
- ...
- spatial_relationship/{1024x576/512x320}
- a frisbee on the left of a sports ball, front view-0.mp4
- a frisbee on the left of a sports ball, front view-1.mp4
- ...
- temporal_flickering/{1024x576/512x320}
- static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-0.mp4
- static view on a desert scene with an oasis, palm trees, and a clear, calm pool of water-1.mp4
- ...
- per_category
- videocrafter-1
- animal/{1024x576/512x320}
- wild rabbit in a green meadow-0.mp4
- wild rabbit in a green meadow-1.mp4
- ...
- architecture/{1024x576/512x320}
- water tower on the desert-0.mp4
- water tower on the desert-1.mp4
- ...
- food/{1024x576/512x320}
- waffles with whipped cream and fruit-0.mp4
- waffles with whipped cream and fruit-1.mp4
- ...
- human/{1024x576/512x320}
- young dancer practicing at home-0.mp4
- young dancer practicing at home-1.mp4
- ...
- lifestyle/{1024x576/512x320}
- the interior design of a shopping mall-0.mp4
- the interior design of a shopping mall-1.mp4
- ...
- plant/{1024x576/512x320}
- coconut tree near sea under blue sky-0.mp4
- coconut tree near sea under blue sky-1.mp4
- ...
- scenery/{1024x576/512x320}
- waterfalls in between mountain-0.mp4
- waterfalls in between mountain-1.mp4
- ...
- vehicles/{1024x576/512x320}
- video of yacht sailing in the ocean-0.mp4
- video of yacht sailing in the ocean-1.mp4
- ...
Gen-2 and Pika also include videos for "all_dimension" and "all_category", but we haven't divide the videos into subfolders according to specific dimensions or categories yet.
- per_dimension
- gen-2
- all_dimension
- Yellow flowers swing in the wind-0.mp4
- Yellow flowers swing in the wind-1.mp4
- ...
- pika
- all_dimension
- Yellow flowers swing in the wind-0.mp4
- Yellow flowers swing in the wind-1.mp4
- ...
- per_category
- gen-2
- all_category
- young people celebrating new year at the office-0.mp4
- young people celebrating new year at the office-1.mp4
- ...
- pika
- all_category
- young people celebrating new year at the office-0.mp4
- young people celebrating new year at the office-1.mp4
- ...
Available for download at Google Drive.
Each dimension contains an annotation file, each of which contains a list, and the list contains manually preferred annotation results of videos generated by different prompts. The evaluation process involves comparing videos from different models and, based on human annotations, determining which video best matches the prompt for the corresponding dimension.
JSON data is composed of multiple objects, each representing an evaluation instance. Each instance contains the following key-value pairs:
prompt_en: The text prompt for generating the desired video content.
style_en/color_en/object_en ..: Dimension-related information.
question_en: The question asked to the human annotators / VLM.
videos: This section contains the urls to videos from different models.
human_anno: This section represents human annotation, which is composed of a nested dictionary. The outer keys represent the model names (e.g., "modelscope", "lavie"), and the inner keys represent the other model names. The corresponding values within these nested dictionaries represent the human-assigned scores for the relative quality of each model's video compared to the other model's video.
For example, human_anno["modelscope"]["lavie"] = 0 indicates that humans judged the Lavie video to be better than the Modelscope video for the given prompt and style.
human_anno["modelscope"]["videocraft"] = 1 indicates that humans judged the Modelscope video to be better than the Videocraft video.
human_anno["cogvideo"]["videocraft"] = 0.5 indicates that humans judged the Cogvideo video and the Videocraft video to be of equal quality.