VIDiff: Translating Videos via Multi-Modal Instructions with Diff usion Models

Zhen Xing¹, Qi Dai², Zihao Zhang¹, Hui Zhang², Han Hu², Zuxuan Wu¹, Yu-Gang Jiang¹

¹ Fudan University, ²MicroSoft Research Asia

Paper arXiv Code

Long Video Translation

	Input	Ours	w/o LVT
Recolor the gray video.
Turn the video to Van Gogh Style.

Video Instruct Diffusion

Turn the video to Sketch Style.
Transform the video to Animate/ Oil Painting Style.
Turn the video to WaterColor/ 3D Style.
Edit the video to reflect the style of the target image.
Apply Green to the pixels of the man holding the bike while maintaining the current state of other pixels.
Introduce a range of colors to the gray video.
Remove the applied haze from this video.
Apply inpainting algorithms to recover the missing video.
Withdraw the applied haze from this video.
Mark the pixels of the girl riding the horse in Red and leave the rest unchanged.
For the the horse doing high jumps, set its pixels to Green and let the others remain the same
Improve the quality of this fuzzy video.
Convert the grayscale clip into a colorful masterpiece.