Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

readme.md

Stable diffusion

This is an reimplement of training stable diffusion in LiBai

Environment

Before running the scripts, make sure to install the library's training dependencies:

To make sure you can successfully run the latest versions of the example scripts, we highly recommend installing from source and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements.

Install libai

libai installation, refer to Installation instructions

# create conda env
conda create -n libai python=3.8 -y
conda activate libai

# install oneflow nightly, [PLATFORM] could be cu117 or cu102
python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/[PLATFORM]

# install libai
git clone https://github.com/Oneflow-Inc/libai.git
cd libai
pip install pybind11
pip install -e .
  • All available [PLATFORM]:

    Platform CUDA Driver Version Supported GPUs
    cu117 >= 450.80.02 GTX 10xx, RTX 20xx, A100, RTX 30xx
    cu102 >= 440.33 GTX 10xx, RTX 20xx
    cpu N/A N/A

Install onediff

Important

To make sure you can train stable diffusion in LiBai, please install onediff

git clone https://github.com/Oneflow-Inc/diffusers.git onediff
cd onediff
python3 -m pip install "torch<2" "transformers>=4.26" "diffusers[torch]==0.15.0"
python3 -m pip uninstall accelerate -y
python3 -m pip install -e .

Notes

  • You need to register a Hugging Face account token and login with huggingface-cli login
python3 -m pip install huggingface_hub
  • If no command available in the PATH, it might be in the $HOME/.local/bin
 ~/.local/bin/huggingface-cli login

Start training

Training

  • Downloading Demo dataset

    mkdir mscoco && cd mscoco
    wget https://oneflow-static.oss-cn-beijing.aliyuncs.com/libai/Stable_diffusion/00000.tar
    mkdir 00000
    tar -xvf 00000.tar -C 00000/
  • running command

    set your datapath and features in projects/Stable_Diffusion/configs/config.py

        dataloader.train = LazyCall(build_nlp_train_loader)(
            dataset=[
                # set data path
                LazyCall(TXTDataset)(
                    foloder_name="/path/to/mscoco/00000",
                    ...,
                )
            ]
        )
    
        train.update(
            dict(
                ...,
                # set checkpointing or not
                activation_checkpoint=dict(enabled=True), # or False
    
                # set zero stage
                zero_optimization=dict(
                    enabled=True, # or False
                    stage=2, # Highly recommand stage=2, stage=1 or 3 is also supported 
                ),
    
                # set amp training
                amp=dict(enabled=True), # or False
            )
        )
    
        # set learning rate 
        optim.lr = 1e-3

    running with 4 GPU

    bash tools/train.sh projects/Stable_Diffusion/train_net.py projects/Stable_Diffusion/configs/config.py 4
    

DreamBooth

DreamBooth is a method to personalize text2image models like stable diffusion given just a few(3~5) images of a subject.

  • Downloading Dataset

    Download images from here and save them in a directory (such as /path/to/demo_dog/). This will be our training data.

  • DreamBooth Training

    set your datapath and features in projects/Stable_Diffusion/configs/dreambooth_config.py

        dataloader.train = LazyCall(build_nlp_train_loader)(
            dataset=[
                # set data path
                LazyCall(DreamBoothDataset)(
                    instance_data_root="/path/to/demo_dog/",
                    instance_prompt="a photo of sks dog",
                    ...,
                )
            ]
        )
    
        train.update(
            dict(
                ...,
                # set checkpointing or not
                activation_checkpoint=dict(enabled=True), # or False
    
                # set zero stage
                zero_optimization=dict(
                    enabled=True, # or False
                    stage=2, # Highly recommand stage=2, stage=1 or 3 is also supported 
                ),
    
                # set amp training
                amp=dict(enabled=True), # or False
            )
        )
    
        # set learning rate 
        optim.lr = 1e-3

    running with 4 GPU

    bash tools/train.sh projects/Stable_Diffusion/train_net.py projects/Stable_Diffusion/configs/dreambooth_config.py 4
    
  • Training DreamBooth with prior-preservation loss

    Prior-preservation is used to avoid overfitting and language-drift. Refer to the paper to learn more about it. For prior-preservation we first generate images using the model with a class prompt and then use those during training along with our data. According to the paper, it's recommended to generate num_epochs * num_samples images for prior-preservation. 200-300 works well for most cases.

    • Firstly we need to generate prior-images using the model with a class prompt, here is an example, it will generate 200 prior-images :

      bash projects/Stable_Diffusion/generate.sh

      # generate.sh
      export MODEL_NAME="CompVis/stable-diffusion-v1-4" # choose model type
      export CLASS_DIR="/path/to/prior_dog/" # set data save path
      export CLASS_PROMPT="a photo of dog" # set class prompt
      
      python3 projects/Stable_Diffusion/generate_prior_image.py \
      --pretrained_model_name_or_path=$MODEL_NAME \
      --class_data_dir=$CLASS_DIR \
      --class_prompt="$CLASS_PROMPT" \
      --num_class_images=200 \   # set num_images
    • Secondly, set your datapath and features in projects/Stable_Diffusion/configs/prior_preservation_config.py

          dataloader.train = LazyCall(build_nlp_train_loader)(
              dataset=[
                  LazyCall(DreamBoothDataset)(
                      instance_data_root="/path/to/demo_dog/",
                      instance_prompt="a photo of sks dog",
                      class_data_root="/path/to/prior_dog/",
                      class_prompt="a photo of dog",
                      ...,
                  )
              ]
          )
      
          optim.lr = 2e-6 # set learning rate 
          model.train_text_encoder = True # train text_encoder or not, could be False
          train.train_iter=2000 # set train_iter
          train.log_period=10
  • Training Dreamboth with lora

    Low-Rank Adaption of Large Language Models was first introduced by Microsoft in LoRA: Low-Rank Adaptation of Large Language Models by Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

    In a nutshell, LoRA allows to adapt pretrained models by adding pairs of rank-decomposition matrices to existing weights and only training those newly added weights. This has a couple of advantages:

    • Previous pretrained weights are kept frozen so that the model is not prone to catastrophic forgetting
    • Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable.
    • LoRA attention layers allow to control to which extent the model is adapted towards new training images via a scale parameter.

    set your datapath and features in projects/Stable_Diffusion/configs/lora_config.py

        dataloader.train = LazyCall(build_nlp_train_loader)(
            dataset=[
                # set data path
                LazyCall(DreamBoothDataset)(
                    instance_data_root="/path/to/demo_dog/",
                    instance_prompt="a photo of sks dog",
                    ...,
                )
            ]
        )
    
        train.update(
            dict(
                ...,
                # set checkpointing or not
                activation_checkpoint=dict(enabled=True), # or False
    
                # set zero stage
                zero_optimization=dict(
                    enabled=True, # or False
                    stage=2, # Highly recommand stage=2, stage=1 or 3 is also supported 
                ),
    
                # set amp training
                amp=dict(enabled=True), # or False
            )
        )
    
        # set learning rate 
        optim.lr = 5e-4

    running with 4 GPU

    bash tools/train.sh projects/Stable_Diffusion/train_net.py projects/Stable_Diffusion/configs/lora_config.py
    
    
    
    

Inference with trained model

model will be saved in train.output_dir in config.py,

  • with lora: the model output save dir will be like this:

    output/stable_diffusion
    ├── config.yaml
    ├── log.txt
    ├── model_sd_for_inference
    │   └── pytorch_lora_weights.bin
    

    Here we can use onediff to inference our trained lora-model in Libai

    import oneflow as flow
    flow.mock_torch.enable()
    from onediff import OneFlowStableDiffusionPipeline
    from typing import get_args
    from diffusers.models.attention_processor import AttentionProcessor
    
    for processor_type in get_args(AttentionProcessor):
        processor_type.forward = processor_type.__call__
    
    model_path = "CompVis/stable-diffusion-v1-4"
    pipe = OneFlowStableDiffusionPipeline.from_pretrained(
        model_path,
        use_auth_token=True,
        revision="fp16",
        torch_dtype=flow.float16,
    )
    pipe.unet.load_attn_procs("output/stable_diffusion/model_sd_for_inference/")
    
    pipe = pipe.to("cuda")
    
    for i in range(100):
        prompt = "a photo of sks dog"
        with flow.autocast("cuda"):
            images = pipe(prompt).images
            for j, image in enumerate(images):
                image.save(f"{i}.png")
  • without lora the model output save dir will be like this:

    output/stable_diffusion
    ├── config.yaml
    ├── last_checkpoint
    ├── metrics.json
    ├── model_final
    │   ├── graph
    │   ├── lr_scheduler
    │   └── model
    ├── model_sd_for_inference
    │   ├── feature_extractor
    │   │   └── preprocessor_config.json
    │   ├── model_index.json
    │   ├── safety_checker
    │   │   ├── config.json
    │   │   └── pytorch_model.bin
    │   ├── scheduler
    │   │   └── scheduler_config.json
    │   ├── text_encoder
    │   │   ├── config.json
    │   │   └── pytorch_model.bin
    │   ├── tokenizer
    │   │   ├── merges.txt
    │   │   ├── special_tokens_map.json
    │   │   ├── tokenizer_config.json
    │   │   └── vocab.json
    │   ├── unet
    │   │   ├── config.json
    │   │   └── diffusion_pytorch_model.bin
    │   └── vae
    │       ├── config.json
    │       └── diffusion_pytorch_model.bin
    

    Here we can use onediff to inference our trained model in Libai

    import oneflow as flow
    flow.mock_torch.enable()
    from onediff import OneFlowStableDiffusionPipeline
    
    model_path = "output/stable_diffusion/model_sd_for_inference/"
    pipe = OneFlowStableDiffusionPipeline.from_pretrained(
        model_path,
        use_auth_token=True,
        revision="fp16",
        torch_dtype=flow.float16,
    )
    
    pipe = pipe.to("cuda")
    
    for i in range(100):
        prompt = "a photo of sks dog"
        with flow.autocast("cuda"):
            images = pipe(prompt).images
            for j, image in enumerate(images):
                image.save(f"{i}.png")