Evaluating Compositional Generalisation in VLMs and Diffusion Models

This repo is for the paper Evaluating Compositional Generalisation in VLMs and Diffusion Models.

This work uses the Diffusion-Classifier model proposed in Your Diffusion Model is Secretly a Zero-Shot Classifier

Create environment

conda env create -f envrironment.yml

Install clip

pip install git+https://github.com/openai/CLIP.git

Load dataset

You can download the dataset from https://drive.google.com/file/d/14U4azHV6FHI8yeALWfgKvnH40OpDvbz1/view?usp=sharing

Fine-tune CLIP

python clip_finetune.py --data_path <path_to_training_images> --dataset <single|two_object|relational> --seed <seed_number> --save_path <saved_weights.pt>

For example: python clip_finetune.py --data_path "cobi2_datasets/single_object/train" --dataset "single" --seed 1 --save_path "single_ft.pt"

Fine-tune Diffusion-Classifier

python train_dreambooth.py \
  --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
  --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
  --output_dir=<save_folder_name> \
  --revision="fp16" \
  --seed=1 \
  --resolution=512 \
  --train_batch_size=1 \
  --train_text_encoder \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-6 \
  --lr_scheduler="polynomial" \
  --lr_warmup_steps=0 \
  --num_class_images=30 \
  --sample_batch_size=4 \
  --max_train_steps=3000 \
  --save_interval=3000 \
  --data_type=<single|two_object|relational> \
  --folder_path=<path_to_training_images> \

Run models on data

CLIP

You can use for example the following command to run frozen CLIP on the ID val single data split

python clip_predict.py --image_folder 'cobi2_datasets/single_object/ID_val/' --output_file 'results/single/clip/clip_single_idval_s1.csv' --dataset single

You can use for example the following command to run the fine-tuned single model on the ID val single data split

python clip_predict.py --image_folder 'cobi2_datasets/single_object/ID_val/' --output_file 'results/single/clip/clip_single_idval_s1' --dataset single --model_path models/clip/single_object/seed_1_single.pt

For two object and relational the prompt path must be specified:

python clip_predict.py --image_folder 'cobi2_datasets/two_object/ood_val/' --output_file 'clip_two_oodval_frz' --dataset two_object --prompt_path "cobi2_datasets/two_object/two_object_prompts/two_obj_val"

Diffusion Classifier

You can use the following command to run Diffusion Classfiier on the ID val single data split:

python diffusion-classifier/eval_prob_adaptive.py \
--dataset clevr \
--split test \
--n_trials 1 \
--to_keep 5 1 \
--n_samples 75 200 \
--loss l1 \
--prompt_path cobi2_datasets/single_object/single_prompts.csv \
--seed 1 \
--dataset_path 'cobi2_datasets/single_object/ID_test/' \
--output_file 'single_object_id_val.csv' \
--data_split 'id_val' \
--data_type 'single'
#--model_path 'models/single_object/seed_1.pt'

The model_path flag can be used to load fine-tuned models.

Two object and relational predictions can be run using the corresponding folder of prompt files, for example:

python diffusion-classifier/eval_prob_adaptive.py \
--dataset clevr \
--split test \
--n_trials 1 \
--to_keep 5 1 \
--n_samples 75 200 \
--loss l1 \
--prompt_path cobi2_datasets/two_object/two_object_prompts/two_obj_val \
--seed 1 \
--dataset_path 'cobi2_datasets/two_object/ood_val/' \
--output_file 'two_object_ood_val' \
--data_split 'ood_val' \
--data_type 'two_object'

Viewing Results

The results are saved as csv files of predictions. To view the accuracy using the outputs the `print_acc.py' file can be used.

python print_acc.py --output_file "results/single/clip/clip_single_idval_s1.csv" --dataset "single"

For two object and relational the dataset split must be specified: idval, idtest, val, test, idval_gen, idtest_gen, val_gen or test_gen.

python print_acc.py --output_file "two_object_ood_val" --dataset "two_object" --datatype "val"

Plotting t-SNE of CLIP embeddings

python tsne_plots.py --model_path "relational_finetuned_clip.pt"

Generating fine-tuned Stable Diffusion images


python generate_sd_images.py --model_path "single_finetuned_dc/10" --prompt "a photo of a blue sphere"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluating Compositional Generalisation in VLMs and Diffusion Models

Load dataset

Fine-tune CLIP

Fine-tune Diffusion-Classifier

Run models on data

CLIP

Diffusion Classifier

Viewing Results

Plotting t-SNE of CLIP embeddings

Generating fine-tuned Stable Diffusion images

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
diffusion-classifier		diffusion-classifier
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clip_finetune.py		clip_finetune.py
clip_predict.py		clip_predict.py
environment.yml		environment.yml
generate_sd_images.py		generate_sd_images.py
print_acc.py		print_acc.py
train_dreambooth.py		train_dreambooth.py
tsne_plots.py		tsne_plots.py

License

otmive/diffusion_classifier_clip

Folders and files

Latest commit

History

Repository files navigation

Evaluating Compositional Generalisation in VLMs and Diffusion Models

Load dataset

Fine-tune CLIP

Fine-tune Diffusion-Classifier

Run models on data

CLIP

Diffusion Classifier

Viewing Results

Plotting t-SNE of CLIP embeddings

Generating fine-tuned Stable Diffusion images

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages