CellMapper#
k-NN-based mapping of cells across representations to transfer labels, embeddings and expression values. Works for millions of cells, on CPU and GPU, across molecular modalities, between spatial and non-spatial data, for arbitrary query and reference datasets. Using faiss to compute k-NN graphs, CellMapper takes about 30 seconds to transfer cell type labels from 1.5M cells to 1.5M cells on a single RTX 4090 with 60 GB CPU memory.
Inspired by previous tools, including scanpyβs ingest and the HNOCA-tools packages. Check out the π docs to learn more, in particular our tutorials.
β¨ Key use cases#
𧬠Transfer cell type labels and expression values from dissociated to spatial datasets.
βοΈ Transfer embeddings between arbitrary query and reference datasets.
π Compute presence scores for query datasets in large reference atlasses.
πΊοΈ Identify niches in spatial datasets by contextualizing latent spaces in spatial coordinates.
π Evaluate the results of transferring labels, embeddings and feature spaces using a variety of metrics.
The core idea of CellMapper is to separate the method (k-NN graph with some kernel applied to get a mapping matrix) from the application (mapping across arbitrary representations), to be flexible and fast. The tool currently supports pynndescent, sklearn, faiss and rapids for neighborhood search, implements a variety of graph kernels, and is closely integrated with AnnData objects.
π¦ Installation#
You need to have π Python 3.11 or newer installed on your system. If you donβt have Python installed, we recommend installing uv.
There are two alternative options to install cellmapper:
π Install the latest release from PyPI:
pip install cellmapper
π οΈ Install the latest development version:
pip install git+https://github.com/quadbio/cellmapper.git@main
π Getting started#
This package assumes that you have query and reference AnnData objects, with a joint embedding computed and stored in .obsm. While we implement some baseline approaches to compute joint embeddings (PCA and a fast reimplementation of CCA), we typically expect you to provide a pre-computed joint embedding from some task-specific representation learning tools, e.g. GimVI or ENVI for spatial mapping, GLUE, MIDAS and MOFA+ for modality translation, and scVI, scANVI and scArches for query-to-reference mapping - this is just a small selection!
With a joint embedding in .obsm["X_joint"] at hand, the simplest way to use CellMapper is as follows:
from cellmapper import CellMapper
cmap = CellMapper(query, reference).map(
use_rep="X_joint", obs_keys="celltype", obsm_keys="X_umap", layer_key="X"
)
This will transfer data from the reference to the query dataset, including celltype labels stored in reference.obs, a UMAP embedding stored in reference.obsm, and expression values stored in reference.X.
There are many ways to customize this, e.g. use different ways to compute k-NN graphs and to turn them into mapping matrices, and we implement a few methods to evaluate whether your k-NN transfer was sucessful. The tool also implements a self-mapping mode (only a query object, no reference), which is useful for spatial contextualization and data denoising. Check out the π docs to learn more.
π Release notes#
See the changelog.
π¬ Contact#
If you found a bug, please use the issue tracker.
π Citation#
Please use our zenodo entry to cite this software.