Image

MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation

SIGGRAPH 2025 Conference Proceedings
1The Chinese University of Hong Kong,  2Adobe Research,  3Monash University

Inputs Camera: Static Camera: Dolly out Camera: Orbit right + Pedestal up
Image
Object Motions Image
Image

 


Abstract

This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation. Shot design, a critical aspect of filmmaking, involves meticulously planning both camera movements and object motions in a scene. However, enabling intuitive shot design in modern image-to-video generation systems presents two main challenges: first, effectively capturing user intentions on the motion design, where both camera movements and scene-space object motions must be specified jointly; and second, representing motion information that can be effectively utilized by a video diffusion model to synthesize the image animations.

To address these challenges, we introduce MotionCanvas, a method that integrates user-driven controls into image-to-video (I2V) generation models, allowing users to control both object and camera motions in a scene-aware manner. By connecting insights from classical computer graphics and contemporary video generation techniques, we demonstrate the ability to achieve 3D-aware motion control in I2V synthesis without requiring costly 3D-related training data. MotionCanvas enables users to intuitively depict scene-space motion intentions, and translates them into spatiotemporal motion-conditioning signals for video diffusion models. We demonstrate the effectiveness of our method on a wide range of real-world image content and shot-design scenarios, highlighting its potential to enhance the creative workflows in digital content creation and adapt to various image and video editing applications.

 


Comparison & Showcases

Effectiveness in Cinematic Shot Design (Joint camera and object motion control in a 3D-scene-aware manner).

Camera motion Object global motion Object local motion
[ Pedestal up + Dolly in ]
Image Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Static ]
Image Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Roll clockwise ]
Image Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Tilting up ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Dolly in ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Trucking right + Pedestal up ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Orbit right ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Panning left + Tilting up ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Dolly in ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Pedestal down + Tiliting up ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Panning left + Dolly in ]
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Panning left ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Trucking right ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Static ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Trucking left + Dolly out ]
Image
DragAnything MOFA-Video MotionCanvas (Ours)


Camera motion Object global motion Object local motion
[ Panning left + Pedestal up + Orbit right ]
DragAnything MOFA-Video MotionCanvas (Ours)

 


Applications

Shot Design with Joint Camera and Object Control.

Inputs Camera: Trucking right Camera: Zoom in Camera: Roll clockwise
Image
Image
Image
Inputs Camera: Static Camera: Dolly in Camera: Diagonal bottom-right
Image
Image
Image
Camera: Dolly out Camera: Orbit left Camera: Pedestal up
Camera: Orbit left Camera: [Trcuking left + Pedestal up] Camera: Dolly in
Camera: Dolly out Camera: Dolly in Camera: Trcuking left

 


Long Videos with Complex Trajectories.

Input image Motion control signal Result sample #1 Result sample #2
Image Image
Image Image

Input image Result sample #1 Result sample #2
Image
Image
Image
Image
Image

 


Object Local Motion Control.

Inputs Image Image Image Image
Results

Inputs Image Image Image Image
Results

Inputs Image Image Image Image
Results

Inputs Image Image Image Image
Results

 


Additional Applications

Motion Transfer.

Input source video
Transfer results

Video Editing.

Input video
Editing results

 


Comparisons with Baseline Methods

Camera Motion Control.

Input camera control MotionCtrl CameraCtrl Ours
[ Dolly in + Zoom Out ]
(Dolly zoom)
[ Trucking right ]

Object Motion Control.

Inputs DragAnything MOFA-Video
Image
Camera TrackDiffusion Ours
[ Static ]

Inputs DragAnything MOFA-Video
Image
Camera TrackDiffusion Ours
[ Trucking right ]

Camera Motion Control on DAVIS.

Reference MotionCtrl CameraCtrl Ours

 


Ablation Study

Camera Motion Representation.

Input Gauss. Map Plucker Point Traj Coeff. (Ours)
[ Dolly out + Panning right ]
Input Gauss. Map Plucker Point Traj Coeff. (Ours)
[ Roll clockwise + Zoom out ]

Bounding Box Conditioning.

Input Ourscoord Ours
Image
Input Ourscoord Ours
Image

Additional Analysis

Effect of Point Track Density on Camera Motion Control

Density=0.1 Density=0.4 Density=0.7 Density=1.0
Input point track
Results

Effect of Text Prompt.
- We show camera motion control of "dolly in" with different levels of text detail.

"A man." "A man crossing a stream." "A man with a red backpack steps over a stream in a mountain valley."
"A man wearing a blue flannel shirt, hiking boots, and a red backpack carefully steps across a rocky stream in a picturesque valley surrounded by rugged mountains." "A man crossing a stream. It is raining." "A man crossing a stream and turning around."

Essentiality of Camera-aware and Camera-object-aware Transformations

- Inputs Preview w/o transform w/ transform (Ours)
Camera-aware transformation Image
Image
Camera-object-aware transformation Image

 


Large Camera-motion Results

MotionCanvas 32-frame version.


 


Legend of Camera Motions

Image