Skip to content

kenantang/spice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow

[09.28.2025] The paper has been accepted to NeurIPS Creative AI Track 2025. This page will be further updated soon.

[Paper][Website]

Introduction

SPICE is an image editing workflow that provides state-of-the-art image editing quality, supporting most of the popular diffusion models (Flux Dev, SDXL, SD 1.5, and more). SPICE provides great user experience for beginners and pros alike.

Below, we will provide examples of editing results, installation instructions, and tutorials. To achieve the highest editing quality, our workflow is designed to involve a few novel steps and concepts. If you find them confusing, we are happy to update this README file for better clarity. Please do not hesitate to create a GitHub issue or contact the authors of this repository.

Examples

SPICE produces high-quality and high-complexity image editing results, with little user expertise required. It is a versatile workflow that supports multiple editing needs.

Multi-Purpose Editing

SPICE is good at inpainting, outpainting, structural editing, and detail enhancing. The example below shows how 40 editing steps are iteratively performed to generate an image from an empty canvas. Due to its complexity, the image cannot be possibly generated in a single text-to-image generation step. The characters are inspired by the video game Darkest Dungeon 2.

iterative-editing.mp4

Gestures

SPICE can generate or fix complicated gestures. In the example below, the user wants to generate a number gesture of 6 in Chinese. The left image is generated by gpt4o. Despite the prompt explicitly asks for a gesture of 6 in Chinese number gesture, gpt4o fails to interpret the prompt and generates a gesture of 5. The right image is edited by SPICE, where the gesture is correct.

Image Image

Generate Text

SPICE can generate text. In the example below, the user wants to add the word "SCRANTON" to the clock tower. The left image is the original photo. The right image is edited by SPICE.

Image Image

Fix Text

SPICE can also fix text. The image below is generated using the prompt in one of OpenAI's official examples. The left image is generated by gpt4o, where some words are misspelled (VIGLATORS and TOM-AWAY). The right image is edited by SPICE, where the misspellings have been fixed.

Image Image

Add Occluded Objects

SPICE can handle complicated object occlusions. In the example below, the user wants to add a black backpack onto the bench. The left image is the original image, and the right image is generated by SPICE. Models that are known to fail on this task include gpt4o, Gemini 2.0 Flash, Doubao SeedEdit, UltraEdit, MGIE, and MagicQuill.

Image Image

Fix Failures

SPICE can fix its own failures. In the example below, the user wants to change the background from a river into a desert. The left image is a failed result after the first editing step. The right image is the fixed result after 3 additional editing steps.

Image Image

Adapt to Different Styles

Last but not least, SPICE is adaptable to any art style, if the style is supported by a base model or a LoRA. The two examples are both iteratively generated and refined by SPICE. The characters are from Touhou Project and Hades 2.

Image Image

Installation

The instructions below apply to Linux only, but installing on Windows is very similar. If you need further assistance with other systems, feel free to create a GitHub issue. We are happy to help.

Stable Diffusion Models

If you are using SD 1.5, SDXL, or any checkpoints derived from these two models, we recommend using SPICE in Stable Diffusion Web UI Automatic1111 or Stable Diffusion Web UI Forge. In these two Web UIs, no extra code is needed to use SPICE. Please follow the instructions below.

  1. Install the ControlNet extension and one Canny edge ControlNet model for the base model you are using.

  2. Activate the options as shown in the screenshots.

instruction-webui-1 instruction-webui-2 instruction-webui-3

  1. Start editing.

Flux Models

If you are using Flux, we recommend using SPICE in ComfyUI. The Canny edge ControlNet is released for Flux [dev] but not Flux [schnell] by Black Forest Labs, so we currently only support SPICE on Flux [dev]. In other words, SPICE supports any checkpoints derived from Flux [dev] and any LoRAs based on Flux [dev].

  1. Use the following steps to install ComfyUI and the necessary nodes.
    conda create --name comfyui python=3.12
    conda activate comfyui
    pip install pyyaml
    pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu124
    git clone https://github.com/comfyanonymous/ComfyUI.git
    cd ComfyUI
    pip install -r requirements.txt
    cd custom_nodes
    git clone https://github.com/ltdrdata/ComfyUI-Manager.git
    cd ..
    mv extra_model_paths.yaml.example extra_model_paths.yaml
    [Edit the content of extra_model_paths.yaml so that it points to folders where you plan to save your models.]
    python main.py
    [Drag and drop the spice.json workflow into the UI]
    [In the UI, open the manager panel and install missing custom nodes.]
    [In the manager, manually install ComfyUI-LogicUtils. For some reason, these nodes are not automatically installed.]
    [Shutdown the UI.]
    python main.py
    
  2. Download models. You will need two text encoders, one VAE, one base model, one LoRA, and the Canny edge ControlNet model. To download text encoders, VAE, and the base model, please follow this guide. For LoRA, you can download Midjourney Dreamlike Fantasy FLUX LoRA or any other LoRA you prefer. To download the Canny edge ControlNet model, please follow the official guide.
  3. Start editing.

Other Models

The two sections above cover the majority of popular model checkpoints. However, if there is a different model that you want to use, please do not hesitate to create a GitHub issue. We will add further instructions as requested.

Tutorial

SPICE is easy to use. In order to edit an image, a user needs to sketch both a hint and a mask. The user also needs to provide a prompt that describes the final image. However, no prompt engineering or prompt enhancements are required. With all these inputs, a two-stage denoising process will generate a localized edit in the desired region.

workflow-webpage

We will show two editing steps as an example. We start with a model-generated image below. The image is generated with the WAI illustrious model. The positive prompt is "masterpiece,best quality, amazing quality, 2girls, hakurei reimu, kirisame marisa, sitting side by side, full body". The negative prompt is "bad quality, worst quality, worst detail, sketch, censor, nsfw".

reimu-marisa-original

The image is full of mistakes, such as the hand of the character on the left and the headwear of the character on the right.

Image Image

There are more errors, but we will address the two for now. For both editing steps, we show the first-shot result, fixing the random seed at 3639167428 to prevent cherry-picking.

In the first step, we start by using a color picker to pick the color of the arm. Then, we roughly draw a hint (left). Next, we use a mask to cover the region to be edited (middle). Note that a context dot (upper right corner) is provided to enlarge the editing bounding box. The mask is directly drawn in the Web UI, and the black-and-white image is the mask exported from the Web UI. Although we decide to change the pose of the character, there is no need to change the prompt. After the generation is finished, we fix the error as expected (right).

Image Image Image

We now move to the second step. Again, we pick color from the surrounding regions. This time, we want the headwear to be consistent with the one appeared in Touhou Juuouen.

Image

It would be hard to use a prompt to specify the exact color and shape we want. However, we can get the exact color and shape with SPICE. Again, we first pick surrounding colors and roughly draw the shape we want (left). Then, we use a mask to cover the region that we want to edit (middle). After one step, we get the accurate edited result (right).

Image Image Image

Feel free to experiment with different hyperparameters and different levels of details for the color hint you provide. While we only show two example steps, all errors on this image can be similarly fixed.

Here is another example. The character is Chirizuka Ubame from Touhou Kinjoukyou.

chirizuka-ubame.mp4

Citation

@misc{tang2025spicesynergisticpreciseiterative,
      title={SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow}, 
      author={Kenan Tang and Yanhong Li and Yao Qin},
      year={2025},
      eprint={2504.09697},
      archivePrefix={arXiv},
      primaryClass={cs.GR},
      url={https://arxiv.org/abs/2504.09697}, 
}

About

SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages