I am a Principal Researcher at Microsoft Research in Cambridge, MA.
My research focuses on developing new AI methods to understand and design biology, with the ultimate aim of realizing precision biomedicines to improve human health.
To this end, I co-lead Project Ex Vivo, a collaborative effort between Microsoft and the Broad Institute, that is focused on defining, engineering, and targeting cell states in cancer.
My research focuses on developing new AI methods to understand and design biology, with the ultimate aim of realizing precision biomedicines to improve human health. My work bridges the computational and experimental worlds to create new diagnostic and therapeutic biotechnologies and to achieve new insights into cancer.
Formerly Ava Soleimany. *Denotes co-first authorship. †Denotes corresponding authors.
We introduce a hierarchical cross-entropy loss for single-cell cell type annotation, finding that this simple modification significantly improves out-of-distribution performance without added computational cost.
We evaluate the performance of single-cell foundation models in ``zero-shot`` settings where they are used without any further training, finding that they are outperformed by simpler methods.
We introduce a new deep learning framework, MICON, that models chemical compounds as treatments that induce counterfactual transformations of cell phenotypes for improved representation learning in microscopy-based morphological profiling.
We develop a method for protecting against over-clustering in analysis of single-cell RNA sequencing data, by controlling for the impact of reusing the same data twice when performing differential expression analysis.
We present CleaveNet, an end-to-end AI pipeline for the design of peptide-based protease substrates, enabling generation of peptides guided by a target cleavage profile for the design of efficient and selective substrates.
We systematically investigate the consequences of training dataset composition on the behavior of deep learning models of single-cell transcriptomics, focusing on human hematopoiesis as a tractable model system and including cells from adult and developing tissues, disease states, and perturbation atlases.
We investigate the role of pre-training dataset size and diversity on the performance of single-cell foundation models on both zero-shot and fine-tuned tasks, finding that current methods plateau in performance with pre-training datasets that are only a fraction of the full size.
We take a deep dive into scBERT, one recently developed transformer model for single-cell RNA-sequencing data, to develop a deeper understanding of the potential benefits and limitations of single-cell foundation models.
We develop EvoDiff, a general-purpose discrete diffusion over protein sequences, that combines evolutionary-scale data with the distinct conditioning capabilities of diffusion models for controllable protein generation in sequence space.
Scalable, compressed phenotypic screening using pooled perturbations
Nuo Liu,
Walaa E. Kattan,
Benjamin E. Mead,
Conner Kummerlowe,
Thomas Cheng,
Sarah Ingabire,
Jamie H. Cheah,
Christian K. Soule,
Anita Vrcic,
Jane K. McIninch,
Sergio Triana,
Manuel Guzman,
Tyler T. Dao,
Joshua M. Peters,
Kristen E. Lowder,
Lorin Crawford,
Ava P. Amini,
Paul C. Blainey,
William C. Hahn,
Brian Cleary,
Bryan Bryson,
Peter S. Winter,
Srivatsan Raghavan,
Alex K. Shalek
Nature Biotechnology, 2024
pdf
We establish a method of pooling perturbations, like chemical compounds, followed by computational deconvolution to reduce required sample size, labor, and cost in high-throughput phenotypic screens.
Mutation and cell state compatibility is required and targetable in Ph+ acute lymphoblastic leukemia minimal residual disease
Peter S. Winter*,
Michelle L. Ramseier*,
Andrew W. Navia*,
Sachit Saksena,
Haley Strouf,
Nezha Senhaji,
Alan DenAdel,
Mahnoor Mirza,
Hyun Hwan An,
Laura Bilal,
Peter Dennis,
Catherine S. Leahy,
Kay Shigemori,
Jennyfer Galves-Reyes,
Ye Zhang,
Foster Powers,
Nolawit Mulugeta,
Alejandro J. Gupta,
Nicholas Calistri,
Alex Van Scoyk, Kristen Jones,
Huiyun Liu,
Kristen E. Stevenson,
Siyang Ren,
Marlise R. Luskin,
Charles P. Couturier,
Ava P. Amini,
Srivatsan Raghavan,
Robert J. Kimmerling,
Mark M. Stevens,
Lorin Crawford,
David M. Weinstock,
Scott R. Manalis,
Alex K. Shalek,
Mark A. Murakami
bioRxiv, 2024
pdf
Utilizing patient-derived xenograft (PDX) models and clinical trial specimens of acute lymphoblastic leukemia (ALL), we examined how genetic and transcriptional features co-evolve to drive progression during prolonged tyrosine kinase inhibitor response, uncovering a landscape of cooperative mutational and transcriptional escape mechanisms that differ from those causing resistance to first generation inhibitors.
To understand how the features learned in pretraining protein language models (PLMs) relate to and are useful for downstream tasks, we perform a systematic analysis of transfer learning using PLMs, conducting 370 experiments across a comprehensive suite of factors including different downstream tasks, architectures, model sizes, model depths, and pretraining time.
We developed a non-parametric infinite mixture model that leverages Bayesian sparse priors to identify marker genes while simultaneously performing clustering on single-cell expression data.
We present FoldingDiff, a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process.
Priming agents transiently reduce the clearance of cell-free DNA to improve liquid biopsies
Carmen Martin-Alonso*,
Shervin Tabrizi*,
Kan Xiong*,
Timothy Blewett,
Sainetra Sridhar,
Andjela Crnjac,
Sahil Patel,
Zhenyi An,
Ahmet Bekdemir,
Douglas Shea,
Shih-Ting Wang,
Sergio Rodriguez-Aponte,
Christopher A. Naranjo,
Justin Rhoades,
Jesse D. Kirkpatrick,
Heather E. Fleming,
Ava P. Amini,
Todd R. Golub,
J. Christopher Love†,
Sangeeta N. Bhatia†,
Viktor A. Adalsteinsson† Science, 2024
pdf /
MIT press /
general press
We develop intravenous priming agents that are given prior to a blood draw to increase the
abundance of cell-free DNA in circulation, improving the sensitivity of liquid biopsy cancer diagnostic assays.
A commentary on a recent image-inspired, diffusion model to generate new protein structures.
Low protease activity in B cell follicles promotes retention of intact antigens after immunization
Aereas Aung,
Ang Cui,
Laura Maiorino,
Ava P. Amini,
Justin R. Gregory,
Maurice Bukenya,
Yiming Zhang,
Heya Lee,
Christopher A. Cottrell,
Duncan M. Morgan,
Murillo Silva,
Heikyung Suh,
Jesse D. Kirkpatrick,
Parastoo Amlashi,
Tanaka Remba,
Leah M. Froehle,
Shuhao Xiao,
Wuhbet Abraham,
Josetta Adams,
J. Christopher Love,
Phillip Huyett,
Douglas S. Kwon,
Nir Hacohen,
William R. Schief,
Sangeeta N. Bhatia,
Darrell J. Irvine
Science, 2023
pdf /
MIT press /
general press
We discover "sanctuaries" within lymph nodes that contain low proteolytic activity and
act as a safe haven for vaccines, and demonstrate that this heterogeneity
can be exploited to enhance vaccine-induced antibody response.
We engineer an integrated set of methods for measuring specific protease activities
across the organismal, tissue, and cellular scales, and unify these methods into a
methodological hierarchy that powers new biological insights into cancer.
We build Protease Activity Analysis (PAA), a Python software
package with a collection of data analytic and machine learning
tools for analyzing protease activity data.
We develop a sensor-based, ML-driven system to diagnose pneumonia and
classify its etiology, using machine learning to classify directly from
molecular barcodes.
We establish a sensor-based, ML-driven diagnostic for noninvasive, real-time monitoring of disease in a preclinical model
of lymphangioleiomyomatosis (LAM), a rare lung disease.
We design an easily applicable, non-invasive formulation to deliver diagnostic nanosensors through
the skin, enabling a sustained release diagnostic monitoring system for detecting thrombosis.
We design a sense-and-respond system that integrates a synthetic
gene circuit and nanotechnology detection tools for tumor-specific expression
of heterologous biomarkers.
We engineer a new class of enzyme activity probes that can be applied to fresh-frozen tissue sections to spatially localize
protease activty, enabling new insights into the biology of protease dysregulation.
We optimize the immunogenicity of peptide-based antitumor vaccines in mice
by tuning their pharmacokinetics via fusion of the peptide epitopes to
protein carriers.
Review detailing how integrating techniques from multiple disciplines has developed engineered diagnostics that are selectively
activated in disease states, highlighting their potential to realize the goals of precision medicine.
Generalizable algorithm for mitigating hidden biases within training data,
by leveraging learned latent distributions to adaptively re-weight the importance of certain data points while training.
I am an organizer and lecturer for Introduction to Deep Learning (6.S191), MIT’s official introductory course on deep learning foundations and applications.
Together with Alexander Amini, I have organized and developed all aspects of the course, including developing the curriculum, teaching the lectures, creating software labs, and collaborating with industry sponsors.
All materials can be found online on the course website.
I am a co-founder and director for Momentum AI, an outreach program that teaches AI and machine learning to under-resourced and under-served high school
students from the greater Boston area. Our two-week capston program is a free, project-based deep dive into AI and is held on MIT's campus.