VIDiff: Translating Videos via Multi-Modal Instructions with Diff usion Models

Zhen Xing1, Qi Dai2, Zihao Zhang1, Hui Zhang2, Han Hu2, Zuxuan Wu1, Yu-Gang Jiang1
1 Fudan University, 2MicroSoft Research Asia

Long Video Translation


Input Ours w/o LVT
Recolor the gray video. Image Image Image
Turn the video to Van Gogh Style. Image Image Image

Video Instruct Diffusion


Turn the video to Sketch Style. Image Image
Transform the video to Animate/ Oil Painting Style. Image Image
Turn the video to WaterColor/ 3D Style. Image Image
Edit the video to reflect the style of the target image. Image Image
Apply Green to the pixels of the man holding the bike while maintaining the current state of other pixels. Image Image
Introduce a range of colors to the gray video. Image Image
Remove the applied haze from this video. Image Image
Apply inpainting algorithms to recover the missing video. Image Image
Withdraw the applied haze from this video. Image Image
Mark the pixels of the girl riding the horse in Red and leave the rest unchanged. Image Image
For the the horse doing high jumps, set its pixels to Green and let the others remain the same Image Image
Improve the quality of this fuzzy video. Image Image
Convert the grayscale clip into a colorful masterpiece. Image Image