Inversion-Free Image Editing
with Natural Language

1University of Michigan  2University of California, Berkeley 

* Equal contribution    Correspondence

CVPR 2024

Image

We present an inversion-free editing (InfEdit) method that allows for consistent editing at both the semantic and spatial levels, catering to intricate modifications without compromising on the image's integrity and explicit inversion. Through extensive experiments, InfEdit shows strong performance in complex editing tasks and also maintains a seamless workflow (less than 3 seconds on one A40), demonstrating the potential for real-time applications.


Image
Image

A painting of a waterfall
[+and angels] in the mountains

Image
Image

A woman in a coat
[+and dress] is dancing

Image
Image

[+Oil painting of] a lake with mountains in the background

Image
Image

A woman in a white red
dress sitting on a chair with flowers

Image
Image

A man in a white shirt standing in front of trees mountains

Image
Image

A light brown bear
sitting standing on the ground

Image
Image

Muffin Chihuahua

Image
Image

A football with OSU UMich logo

Image
Image

A blue droplet red fire emoji with a smiling angry face with yellow dot

Experiments

InfEdit in various complex image editing tasks:


Comparison

Comparison with inversion-base methods:

Image

Performance in image editing: DDCM matches or exceeds other algorithms, with LCM and UAC bringing further improvement. Notably, it runs about an order of magnitude faster.

Image

Qualitative examples: InfEdit vs prior methods. InfEdit attains editing goals with the best consistency with source images.

Comparison with existing methods:

Image

Qualitative examples: InfEdit vs prior methods. InfEdit attains editing goals with the best consistency with source images.

More Results

Method

We make an attempt to eliminate the inversion process and introduce Denoising Diffusion Consistent Model (DDCM), a sampling strategy that enables virtual inversion. DDCM leverages a diffusion process that significantly enhances consistency throughout the image generation phases, ensuring fidelity and speed in transforming and refining visual content.

Image Image
Image

We also present Unified Attention Control (UAC) for tuning-free image editing through natural language that integrates cross-attention and self-attention control within a unified framework.

Image Image
Detail

BibTeX

@article{xu2023infedit,
  title={Inversion-Free Image Editing with Natural Language}, 
  author={Sihan Xu and Yidong Huang and Jiayi Pan and Ziqiao Ma and Joyce Chai},
  booktitle={Conference on Computer Vision and Pattern Recognition 2024},
  year={2024}
}