Tell Me What Happened: Unifying Text-guided Video Completion
via Multimodal Masked Video Generation


Tsu-Jui Fu1   Licheng Yu2   Ning Zhang2   Cheng-Yang Fu2
Jong-Chyi Su3   William Yang Wang1   Sean Bell2
1UC Santa Barbara   2Meta   3NEC Lab
Conference on Computer Vision and Pattern Recognition (CVPR) 2023

Image

Abstract

Generating a video given the first several static frames is challenging as it anticipates reasonable future frames with temporal coherence. Besides video prediction, the ability to rewind from the last frame or infilling between the head and tail is also crucial, but they have rarely been explored for video completion. Since there could be different outcomes from the hints of just a few frames, a system that can follow natural language to perform video completion may significantly improve controllability. Inspired by this, we introduce a novel task, text-guided video completion (TVC), which requests the model to generate a video from partial frames guided by an instruction. We then propose Multimodal Masked Video Generation (MMVG) to address this TVC task. During training, MMVG discretizes the video frames into visual tokens and masks most of them to perform video completion from any time point. At inference time, a single MMVG model can address all 3 cases of TVC, including video prediction, rewind, and infilling, by applying corresponding masking conditions. We evaluate MMVG in various video scenarios, including egocentric, animation, and gaming. Extensive experimental results indicate that MMVG is effective in generating high-quality visual appearances with text guidance for TVC.



Qualitative Results

👇 press the tab for different tasks and datasets


Kitchen (1282)
First Frame w/o Text Text TATS Ours GT
Image Image
open cupboard
Image Image Image
Image Image
turn on tap
Image Image Image
Image Image
put down cloth
Image Image Image
Image Image
put plate on counter
Image Image Image
Image Image
put down cup
Image Image Image
Image Image
pour water in pot
Image Image Image
Image Image
rinse plate
Image Image Image
Image Image
open fridge
Image Image Image
Image Image
place pan
Image Image Image
Image Image
close lid
Image Image Image

Flintstones (1282)
First Frame w/o Text Text TATS Ours GT
Image Image
They are looking at each other while
Fred is speaking.
Image Image Image
Image Image
Barney is in the
living room, talking.
Image Image Image
Image Image
Wilma is in a car
and talking about something.
Image Image Image
Image Image
Fred is talking
to Barney and
turning his head.
Image Image Image
Image Image
Betty is talking in
the living room
and nodding.
Image Image Image
Image Image
Fred is walking
across a room,
talking to himself.
Image Image Image
Image Image
Betty is quickly opening and closing
a window blind.
Image Image Image
Image Image
Fred is walking outside while he
is looking behind.
Image Image Image
Image Image
Wilma gestures
with one hand
while talking.
Image Image Image
Image Image
Barney is in the
car talking and laughing.
Image Image Image

MUGEN (1282)
First Frame w/o Text Text TATS Ours GT
Image Image
Mugen lands on a snail to crush it. It walks right to collect coins and turns left.
Image Image Image
Image Image
Mugen runs from left to right, collects coin, jumps to hit face, and jumps to down.
Image Image Image
Image Image
Mugen walks to
the right across a platform. It picks up a gem and a coin before crushing a worm.
Image Image Image
Image Image
Mugen jumps over
a gear and then collects a coin.
Image Image Image
Image Image
Mugen jumps and moves left. And
then collects three coins and a gem.
Image Image Image
Image Image
Mugen runs from right to left and it collect coins there. It saw a ladybug so it runs from left to right.
Image Image Image
Image Image
Mugen jumps down and up to collect a coin. Then it jumps left from right to collect the coins.
Image Image Image
Image Image
Mugen climbs up the ladder and moves to the right. It jumps again and gets killed.
Image Image Image
Image Image
Mugen walks to the right, gets a gem, and then jumps up to a platform for coins.
Image Image Image
Image Image
Mugen jumps down to the middle floor to knock the worm and jumps down to the ground floor.
Image Image Image

Kitchen (1282)
Last Frame Text Ours GT
Image
turn tap water off
Image Image
Image
open oven
Image Image
Image
put down pepper
Image Image
Image
close drawer
Image Image
Image
add olive oil
Image Image

Flintstones (1282)
Last Frame Text Ours GT
Image
Betty is in the
living room
reading the paper.
Image Image
Image
Wilma is riding in
the car and talking.
Image Image
Image
Barney is talking
at the window.
Image Image
Image
Fred stands over
the fence to have
a glimpse.
Image Image
Image
Wilma is standing
in the kitchen. She turns her head
then she speaks.
Image Image

MUGEN (1282)
Last Frame Text Ours GT
Image
Mugen runs from right to left and
gets the coins.
Image Image
Image
Mugen runs from left to right, collects some coins, and kills a snail. And it jumps down.
Image Image
Image
Mugen jumps to stage. It runs from
left to right, collects
a coin, and jumps
over a worm. Then
it jumps up.
Image Image
Image
Mugen jumps over
a box and then
towards a coin.
Image Image
Image
Mugen jumps up
and gets the coins.
Image Image

Kitchen (1282)
First Frame Last Frame Text Ours GT
Image Image
take plate
Image Image
Image Image
turn down hob
Image Image
Image Image
take lid off
Image Image
Image Image
grab salt
Image Image
Image Image
put plates in sink
Image Image

Flintstones (1282)
First Frame Last Frame Text Ours GT
Image Image
Barney is trying
to sing in a
living room.
Image Image
Image Image
Wilma is talking
and holds a
finger over her lips.
Image Image
Image Image
Fred and Barney speak out loud.
Image Image
Image Image
Wilma looks back
at Betty who is speaking then she turns her head.
Image Image
Image Image
Fred and Wilma
are dancing
across a room.
Image Image

MUGEN (1282)
First Frame Last Frame Text Ours GT
Image Image
Mugen runs to
left. Then collects
a coin and a gem.
Image Image
Image Image
Mugen jumps up
the stage. It runs
from left to right
and jumps on a worm.
Image Image
Image Image
Mugen jumps
down the ladder
and jumps up. It
collects a gem.
Image Image
Image Image
Mugen climbs up a
ladder. It jumps onto
a stack of boxes, drops down, and is killed by a worm.
Image Image
Image Image
Mugen climbs a ladder onto a platform and kills a worm. It then walks to the right and towards a gem.
Image Image

Kitchen (1282)
K Frames Text K=1 K=2 K=3 K=4 GT
Image
open drawer
Image Image Image Image Image
Image
put bowl in saucepan
Image Image Image Image Image
Image
put glove back
Image Image Image Image Image
Image
place pepper shaker
Image Image Image Image Image
Image
rinse knife
Image Image Image Image Image

Flintstones (1282)
K Frames Text K=1 K=2 K=3 K=4 GT
Image
Wilma is talking on the phone.
Image Image Image Image Image
Image
Fred and Barney shake hands across the rock wall.
Image Image Image Image Image
Image
Betty and Wilma
are peeking their heads into a room.
Image Image Image Image Image
Image
Betty and barney are dancing in a room.
Image Image Image Image Image
Image
Fred angrily points
at Wilma while
he talks to her.
Image Image Image Image Image

MUGEN (1282)
K Frames Text K=1 K=2 K=3 K=4 GT
Image
Mugen runs to the right and jumps to an upper level, where
it collects a coin.
Image Image Image Image Image
Image
Mugen walks to the right to get a gem. It then walks past the ladybug and jumps.
Image Image Image Image Image
Image
Mugen runs to the right, jumps onto
a platform, and collects coins.
Image Image Image Image Image
Image
Mugen walks. And then it jumps.
Image Image Image Image Image
Image
Mugen goes from
the left and
collects coins.
Image Image Image Image Image

Flintstones (1282)
First Frame Text 1 Output 1 Text 2 Output 2
Image
Fred speaks to a phone and laughs.
Image
Fred hangs up a phone and laughs.
Image
Image
Fred is holding
a bone beside
his head.
Image
Fred is moving a
bone up and down
beside his head.
Image
Image
Wilma and Betty
are walking
through a room.
Image
Wilma and Betty
are speaking
in a room.
Image

MUGEN (1282)
First Frame Text 1 Output 1 Text 2 Output 2
Image
Mugen jumps from left to right to an upper platform.
Image
Mugen runs right
to left
and
collects a gem.
Image
Image
Mugen runs to
the right. It jumps
down the ground
.
Image
Mugen runs to
the right. It jumps
landing on a face
.
Image
Image
Mugen jumps.
Image
Mugen jumps and gets a gem and coin.
Image

MUGEN (1282)
Last Frame Text 1 Output 1 Text 2 Output 2
Image
Mugen walks to the right and then jumps over a mouse to collect coins.
Image
Mugen walks to the right and then jumps to collect coins.
Image
Image
Mugen jumps and moves to left.
Image
Mugen jumps down from a ladder and moves to left.
Image
Image
Mugen moves from right to left and climbs onto a ladder.
Image
Mugen moves from left to right and climbs onto a ladder.
Image

MUGEN (1282)
First Frame Last Frame Text 1 Output 1 Text 2 Output 2
Image Image
Mugen jumps down
a ladder, collects coins, and then
jumps down.
Image
Mugen jumps down
a ladder, collects coins, and then
jumps up and down.
Image
Image Image
Mugen collects coins. Then it runs from
left to right.
Image
Mugen collects coins and jumps on face. Then it runs from
left to right.
Image
Image Image
Mugen jumps to get the coin. It then keeps walking and drops
off the platform.
Image
Mugen jumps to get the coin. It then jumps over a gear and drops off the platform.
Image



UCF-101 (1282)
Image Image Image Image Image
Image Image Image Image Image
Image Image Image Image Image

BAIR (642)
Image Image Image Image Image
Image Image Image Image Image
Image Image Image Image Image

UCF-101 (1282)
Typing Writing on Board Knitting Pull Up Mixing
Image Image Image Image Image
Front Crawl Skiing Yo Yo Playing Guitar Surfing
Image Image Image Image Image

WebVid (3842)
wash hand cut chicken
with knife
boy writes
in notebook
sailboat on
horizon
cloudscape
time-lapse
Image Image Image Image Image
type on
laptop keyboard
downtown city with traffic car cook goulash
soup
wind turbine flame and wood
Image Image Image Image Image
pour coffee
into cup
walk outdoors
on beach
high-speed railway link busy city of
times square
young team
busy discusses
Image Image Image Image Image
green sea turtle swims and relaxes man trains on exercise machine flags flatter on strong wind beautiful girl swings river flows under old stone bridge
Image Image Image Image Image
rotates apple lollipop sunset on the baltic sea woman unveils curtain elephant shakes his head hand of man playing guitar
Image Image Image Image Image
designer mixes paint underwater of coral reef drive on
empty road
child rides
a bike
shave his face
in bathroom
Image Image Image Image Image

TGIF (3842)
shove food in his mouth tilt her head hug each other surf in the ocean sing a song
Image Image Image Image Image
perform in front of audiences skateboard down the hill move his lips talk and smile stick her tongue out
Image Image Image Image Image
kiss each other ice skating snowboard on a mountain walk palm tree is blowing
Image Image Image Image Image
cut a mini pizza swing his club slap their hands together flying above the clouds run along
a track
Image Image Image Image Image