Code accompanying the paper How can embedding models bind concepts?
- Setup
- Data
- Extracting CLIP embeddings
- Probe suite
- Interventions
- Scene complexity analysis
- Amortized training
Requirements: Python 3.10, uv
uv syncuv will create a virtualenv and install all dependencies (including CLIP from source).
The scripts below assume the following variables are set:
CODE_REPO=/mnt/lustre/work/oh/owl661/mobpub/mob_project
OUTPUT_ROOT=$CODE_REPO/data/clip_checks_public
# Synthetic 20x20 text dataset
TEXT_DATASET_20X20=/mnt/lustre/work/oh/owl661/mob_project/data/clip_checks/objs2_concepts2_values20_nodedup_max1000000000_actual160400_mixed_20260104-111651_False/text_dataset.pkl
# Raw image sources (only needed for re-extraction)
RAW_DATA_ROOT=$CODE_REPO/src/mob/multi_obj_clip_analysis
mkdir -p $OUTPUT_ROOT/{clevr,clevr2d,pug_spare}Raw image inputs and the synthetic text dataset are available on Dropbox:
The pipeline expects you to run get_clip_embeddings.py (section 3).
This script generates all derived files under $OUTPUT_ROOT, including
dataset.pkl, metadata.json, labels.csv, and embedding .pkl files.
for DS in clevr clevr2d pug_spare; do
python "$CODE_REPO/src/mob/clip_embeddings/get_clip_embeddings.py" \
--mode image \
--dataset "$DS" \
--model_name clip-ViT-B/32 \
--raw_data_root "$RAW_DATA_ROOT" \
--output_root "$OUTPUT_ROOT"
doneThis produces a unified export layout:
$OUTPUT_ROOT/clevr/{dataset.pkl,metadata.json,labels.csv,clevr_clip-ViT-B_32_embeddings.pkl}$OUTPUT_ROOT/clevr2d/{dataset.pkl,metadata.json,labels.csv,clevr2d_clip-ViT-B_32_embeddings.pkl}$OUTPUT_ROOT/pug_spare/{dataset.pkl,metadata.json,labels.csv,Desert_clip-ViT-B_32_embeddings.pkl}
Raw inputs expected at:
$RAW_DATA_ROOT/datasets/: CLEVR/CLEVR2D (*_labels.pkl,*_images.pkl)$RAW_DATA_ROOT/pug_spare_dataset/: PUG_SPARE (PUG_SPARE.csv+ world folders)
Operates on the synthetic 20×20 text dataset independently of the image datasets.
python "$CODE_REPO/src/mob/clip_embeddings/get_clip_embeddings.py" \
--mode text \
--dataset_path "$TEXT_DATASET_20X20" \
--model_name clip-ViT-B/32Output: $(dirname "$TEXT_DATASET_20X20")/clip_clip-ViT-B_32_text_embeddings.pkl
Linear probes on object identities and concepts, fitted on frozen CLIP features with a controlled scene-level train/test split. Also includes four subtraction variants with estimated concept/object contributions removed.
Script: src/mob/clip_embeddings/run_probe_suite.py
# Derived dataset paths (produced in section 3a)
DATASET_CLEVR=$OUTPUT_ROOT/clevr/dataset.pkl
DATASET_CLEVR2D=$OUTPUT_ROOT/clevr2d/dataset.pkl
DATASET_PUG_SPARE=$OUTPUT_ROOT/pug_spare/dataset.pkl
for DS_PATH in "$DATASET_CLEVR" "$DATASET_CLEVR2D" "$DATASET_PUG_SPARE"; do
DS_NAME=$(basename "$(dirname "$DS_PATH")")
if [[ "$DS_NAME" == "pug_spare" ]]; then
EMB_PATH="$OUTPUT_ROOT/$DS_NAME/Desert_clip-ViT-B_32_embeddings.pkl"
else
EMB_PATH="$OUTPUT_ROOT/$DS_NAME/${DS_NAME}_clip-ViT-B_32_embeddings.pkl"
fi
python "$CODE_REPO/src/mob/clip_embeddings/run_probe_suite.py" \
--dataset-path "$DS_PATH" \
--embedding-path "$EMB_PATH"
done--embedding-path accepts either:
- a base embedding
.pklfile, or - a directory containing exactly one
*_embeddings.pklfile.
Output: <embedding_dir>/probe_suite_<timestamp>/results.json. Each entry includes summary_metrics (train/test accuracies per pack) and dataset_metadata.
The train/test split is scene-level with a fixed seed. Trained embeddings are saved as trained_embeddings_dim512_simdot_fit_conceptsTrue_fit_objectsTrue_<train-ratio>__<embedding_name>.pt.
Interventions evaluate whether a steered scene embedding retrieves the intended control scene and preserves concept-object structure under probes.
Script: src/mob/clip_embeddings/run_interventions.py
Before running interventions, make sure you have:
- Dataset folder with
dataset.pklunder$OUTPUT_ROOT/<dataset>/ - Scene embeddings for the same dataset/model
- Probe artifact (
trained_embeddings_*.pt) for that embedding file (optional) - Single-object embeddings if using
--object-embedding-mode single_object
If you pass --probe without --probe-path, run_interventions.py will try to resolve probes from the embedding artifact and, if missing, train probes internally via train_embeddings.py.
Single-object embeddings can be produced with get_clip_embeddings.py from a single-object image dictionary:
python "$CODE_REPO/src/mob/clip_embeddings/get_clip_embeddings.py" \
--mode image \
--dataset clevr \
--model_name dinov2-vitb14 \
--single_object_images_path "$OUTPUT_ROOT/clevr/CLEVR_posfix_images_single.pkl"Example for CLEVR + DINO ViT-B/14 + single-object bank:
python "$CODE_REPO/src/mob/clip_embeddings/run_interventions.py" \
--dataset clevr \
--dataset-path "$OUTPUT_ROOT/clevr" \
--embedding-path "$OUTPUT_ROOT/clevr/clevr_dinov2-vitb14_embeddings.pkl" \
--probe \
--probe-path "$OUTPUT_ROOT/clevr/embeddings/trained_embeddings_dim768_simdot_fit_conceptsTrue_fit_objectsTrue_0.4__clevr_dinov2-vitb14_embeddings.pkl.pt" \
--object-embedding-mode single_object \
--single-object-embeddings-path "$OUTPUT_ROOT/clevr/CLEVR_posfix_images_single_dinov2-vitb14_embeddings.pkl" \
--output-json--output-json without a value auto-generates a descriptive file under:
<dataset-path>/interventions/.
Supported object embedding modes:
avg_scene_position_independent(obj-avg)avg_scene_position_dependent(pos-avg)single_object(requires--single-object-embeddings-path; not supported for PUG:SPARE).
Probe accuracy as a function of training set size, under an object-level split: train on scenes where all objects are in the train set, test on scenes with entirely unseen objects. Stricter than the scene-level split in section 4.
Script: src/mob/clip_embeddings/approximate_complexity_scenes.py
Launcher: src/mob/clip_embeddings/run_analyses.sh
The launcher sweeps training fractions (0.1–0.9) across learning rates.
Available profiles:
| Profile | Description |
|---|---|
regular_mlp |
Non-multiplicative concat MLP + hidden-width sweep |
mult_linear |
Multiplicative probe + linear head |
sum_mult_linear |
Multiplicative + sum probe + linear head |
EMB_CLEVR="$OUTPUT_ROOT/clevr/embeddings/trained_embeddings_dim512_simdot_fit_conceptsTrue_fit_objectsTrue_0.4__clevr_clip-ViT-B_32_embeddings.pkl.pt"
EMB_CLEVR2D="$OUTPUT_ROOT/clevr2d/embeddings/trained_embeddings_dim512_simdot_fit_conceptsTrue_fit_objectsTrue_0.4__clevr2d_clip-ViT-B_32_embeddings.pkl.pt"
EMB_PUG_SPARE="$OUTPUT_ROOT/pug_spare/embeddings/trained_embeddings_dim512_simdot_fit_conceptsTrue_fit_objectsTrue_0.4__Desert_clip-ViT-B_32_embeddings.pkl.pt"
# 1) regular MLP
for EMB in "$EMB_CLEVR" "$EMB_CLEVR2D" "$EMB_PUG_SPARE"; do
"$CODE_REPO/src/mob/clip_embeddings/run_analyses.sh" \
--lrs 1e-2 \
--profiles regular_mlp \
--regular-hidden-specs 1024 \
--embeddings "$EMB"
done
# 2) multiplicative linear (no hidden MLP)
for EMB in "$EMB_CLEVR" "$EMB_CLEVR2D" "$EMB_PUG_SPARE"; do
"$CODE_REPO/src/mob/clip_embeddings/run_analyses.sh" \
--lrs 1e-2 \
--profiles mult_linear \
--embeddings "$EMB"
done
# 3) sum + multiplicative linear (no hidden MLP)
for EMB in "$EMB_CLEVR" "$EMB_CLEVR2D" "$EMB_PUG_SPARE"; do
"$CODE_REPO/src/mob/clip_embeddings/run_analyses.sh" \
--lrs 1e-2 \
--profiles sum_mult_linear \
--embeddings "$EMB"
doneSmall transformer that decomposes scene embeddings into per-object embeddings. Uses an object-level generalization split via --train-object-fraction.
Scripts: src/mob/clip_embeddings/amortization/{train_clean.py,models.py,scenes.py}
mkdir -p "$OUTPUT_ROOT/amortized"
python "$CODE_REPO/src/mob/clip_embeddings/amortization/train_clean.py" \
--epochs 50000 \
--model-d-out 64 \
--model-d-model 256 \
--model-num-heads 4 \
--model-num-layers 6 \
--model-lr 3e-5 \
--sim-type cos \
--train-batch-size 512 \
--probe-concepts true \
--probe-objects true \
--max-num-objects 2 \
--num-concepts 2 \
--num-vals-per-concept 20 \
--test-every-steps 200 \
--test-num-batches 10 \
--train-object-fraction 0.8 \
--use-wandb false \
--save-model true \
--working-dir "$OUTPUT_ROOT/amortized"Convert amortization checkpoints (model_best_test_objects.pt) to complexity-analysis inputs:
OUT_ROOT_AMORT="$OUTPUT_ROOT/amortization"
AMORT_CKPT_1="/mnt/lustre/work/oh/owl661/mob_project/experiments/configs/2026-02-28_14:08:08.146297---f4734b959a3416e52933---2235050/models/model_best_test_objects.pt"
for CKPT in "$AMORT_CKPT_1"; do
RUN_NAME=$(basename "$(dirname "$(dirname "$CKPT")")" | tr '-' '_')
python "$CODE_REPO/src/mob/clip_embeddings/extract_dataset_and_embeddings_from_pretrained_clike.py" \
--model-path "$CKPT" \
--output-root "$OUT_ROOT_AMORT" \
--output-name "$RUN_NAME" \
--objects-source all \
--max-objs 400 \
--batch-size 1024
doneThis writes, for each converted checkpoint:
$OUT_ROOT_AMORT/<run_name>/dataset.pkl$OUT_ROOT_AMORT/<run_name>/metadata.json$OUT_ROOT_AMORT/<run_name>/scene_embeddings.pkl
Run complexity on a converted amortization embedding:
EMB_AMORT="/mnt/lustre/work/oh/owl661/mobpub/mob_project/data/clip_checks_public/amortization/2026_02_28_14:08:08.146297___f4734b959a3416e52933___2235050/scene_embeddings.pkl"
"$CODE_REPO/src/mob/clip_embeddings/run_analyses.sh" \
--lrs 1e-2 \
--profiles regular_mlp \
--regular-hidden-specs 1024 \
--embeddings "$EMB_AMORT"
"$CODE_REPO/src/mob/clip_embeddings/run_analyses.sh" \
--lrs 1e-2 \
--profiles mult_linear \
--embeddings "$EMB_AMORT"
"$CODE_REPO/src/mob/clip_embeddings/run_analyses.sh" \
--lrs 1e-2 \
--profiles sum_mult_linear \
--embeddings "$EMB_AMORT"