NearID: Identity Representation Learning via Near-identity Distractors

NearID produces identity-aware image embeddings that remain stable across background and context changes while correctly rejecting near-identity distractors — visually similar but different instances placed in the same context. It is designed for evaluating identity preservation in personalized image generation.

Quick Start

pip install nearid
# or install from source:
# pip install -e .

from transformers import AutoModel, AutoImageProcessor
from PIL import Image

model = AutoModel.from_pretrained("Aleksandar/nearid-siglip2", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("Aleksandar/nearid-siglip2")

inputs = processor(images=Image.open("photo.jpg"), return_tensors="pt")
embedding = model.get_image_features(**inputs)  # [1, 1152], L2-normalised

Results

Near-Identity Discrimination & Alignment (Table 1)

Scoring Model	NearID SSR	NearID PA	MTG MO	MTG MOpair	MTG SSR	MTG PA	DB++ MH
CLIP ViT-L/14	10.31	20.92	0.239	0.484	0.0	0.0	0.493
DINOv2 ViT-L/14	20.43	34.55	0.324	0.519	0.0	0.0	0.492
SigLIP2 (backbone)	30.74	48.81	0.180	0.366	0.0	0.0	0.516
VSM	32.13	46.70	0.394	0.445	7.0	24.5	0.190
NearID (Ours)	99.17	99.71	0.465	0.486	35.0	46.5	0.545

SSR and PA are averaged across seven inpainting settings (three excluded from training). MO/MOpair = metric-to-oracle correlation; MH = metric-to-human correlation (Fisher-z averaged).

Installation

Inference only:

pip install -e .

Training & evaluation:

conda env create -f environment.yaml
conda activate nearid
pip install -e ".[all]"

Usage

Pairwise Similarity

import torch

emb_a = model.get_image_features(**processor(images=img_a, return_tensors="pt"))
emb_b = model.get_image_features(**processor(images=img_b, return_tensors="pt"))

similarity = (emb_a @ emb_b.T).item()  # cosine similarity

Batch Inference

images = [Image.open(p) for p in image_paths]
inputs = processor(images=images, return_tensors="pt", padding=True)
embeddings = model.get_image_features(**inputs)  # [B, 1152]

sim_matrix = embeddings @ embeddings.T

Architecture

Property	Value
Base model	`google/siglip2-so400m-patch14-384`
Backbone	SigLIP2 SO400M ViT/14 @ 384px (frozen)
Pooling head	Multi-head Attention Pooling (MAP), initialised from SigLIP2 (trained)
Embedding dim	1152
Total parameters	~428M
Trainable parameters	~15M (head-only)
Input resolution	384 x 384

Training

Train with the NearID loss (extended InfoNCE with near-identity distractor ranking):

accelerate launch -m training.train \
    --loss_config "infonce_ext:1.0" \
    --head_type map --head_out_dim 1152 \
    --lr 1e-4 --epochs 11 --data.batch_size 128

See docs/TRAINING.md for the full guide.

Evaluation

# Step 1: compute similarities
python -m evaluation.sim_test \
    --mode fullneg --model "Aleksandar/nearid-siglip2" \
    --ds "Aleksandar/NearID" --ds_neg "path/to/negatives" \
    --output_folder runs/evals/ --batch_size 64

# Step 2: aggregate tables
python -m evaluation.gen_tables --root runs/evals/ --overlap primary

See docs/EVALUATION.md for the full guide.

Datasets

The NearID benchmark consists of multi-view positives and near-identity distractors generated by an ensemble of inpainting pipelines. All datasets are released under CC-BY-4.0.

Dataset	Description	HuggingFace
NearID	Multi-view positives (anchor + positive views)	`Aleksandar/NearID`
NearID-Flux	Near-identity distractors via FLUX.1	`Aleksandar/NearID-Flux`
NearID-Flux_1024	FLUX.1 @ 1024px	`Aleksandar/NearID-Flux_1024`
NearID-FluxC	FLUX.1 Canny-guided	`Aleksandar/NearID-FluxC`
NearID-FluxC_1024	FLUX.1 Canny-guided @ 1024px	`Aleksandar/NearID-FluxC_1024`
NearID-PowerPaint	PowerPaint inpainting	`Aleksandar/NearID-PowerPaint`
NearID-Qwen	Qwen-based inpainting	`Aleksandar/NearID-Qwen`
NearID-Qwen_1328	Qwen-based @ 1328px	`Aleksandar/NearID-Qwen_1328`
NearID-SDXL	Stable Diffusion XL inpainting	`Aleksandar/NearID-SDXL`
NearID-SDXL_1024	SDXL @ 1024px	`Aleksandar/NearID-SDXL_1024`

from datasets import load_dataset

positives = load_dataset("Aleksandar/NearID")
negatives = load_dataset("Aleksandar/NearID-Flux")

Model Zoo

Model	HuggingFace Hub	SSR	PA	MH
NearID (SigLIP2 + MAP)	`Aleksandar/nearid-siglip2`	99.17	99.71	0.545

Citation

@article{cvejic2026nearid,
  title={NearID: Identity Representation Learning via Near-identity Distractors},
  author={Cvejic, Aleksandar and Abdal, Rameen and Eldesokey, Abdelrahman and Ghanem, Bernard and Wonka, Peter},
  journal={arXiv preprint arXiv:2604.01973},
  year={2026}
}

Acknowledgements

This work was supported by King Abdullah University of Science and Technology (KAUST) and Snap Inc.

License

Code & model weights: Apache License 2.0. See LICENSE.
Datasets: CC-BY-4.0. Derived from SynCD (MIT License).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
docs		docs
evaluation		evaluation
examples		examples
nearid		nearid
scripts		scripts
training		training
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NearID: Identity Representation Learning via Near-identity Distractors

Quick Start

Results

Near-Identity Discrimination & Alignment (Table 1)

Installation

Usage

Pairwise Similarity

Batch Inference

Architecture

Training

Evaluation

Datasets

Model Zoo

Citation

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NearID: Identity Representation Learning via Near-identity Distractors

Quick Start

Results

Near-Identity Discrimination & Alignment (Table 1)

Installation

Usage

Pairwise Similarity

Batch Inference

Architecture

Training

Evaluation

Datasets

Model Zoo

Citation

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages