I am a PhD student in machine learning at MIT, where I am fortunate to be advised by Ankur Moitra. I am also a member of the wonderful Atomic Architects, led by Tess Smidt. Currently, I am working on guided generation of small molecules as a summer intern at Prescient Design.
I will soon be on the job market!
Previously, I was a research intern at DE Shaw Research, working on tokenization of molecules for LLMs, and at the Open Catalyst Team at Meta FAIR, studying equivariant architectures for chemistry applications.
Before graduate school, I was a research analyst at the Center for Computational Mathematics of the Flatiron Institute in New York, where I worked on developing algorithms at the interface of equivariant deep learning and signal processing for cryoEM.
Broadly, I enjoy developing theoretically principled tools for deep learning (often in scientific domains), with a focus on both understanding and imposing structure for neural representations.
Finally, I co-founded the Boston Symmetry Group, which hosts a recurring workshop for researchers interested in symmetries in machine learning. Follow us on Twitter, shoot us an email, or join our mailing list if you're interested in attending!
I work on harnessing structural ansatzes for improved generalization and interpretability of machine learning pipelines.
Much of my PhD work has focused on a particular, strong structural assumption: group symmetry, or "equivariant machine learning", and its applications to scientific application. These days, I am focused on developing tools that are theoretically principled at a high level, yet - crucially - well-engineered and practically-performant.
Here is a non-exhaustive list of a few high-level questions I've been thinking about recently (or at least, the last time I updated this website, which was on 8/11/25):
In the age of LLMs, what is the future of equivariant learning? (Here are some slides from a recent talk I gave, offering some perspective on this.)
How can we probe how a network "thinks" by discovering structure in its hidden representations?
What is the role of equivariance, e.g. to permutations, in large language models (LLMs)? To what extent is equivariance learned?
What is the right way to tokenize geometric objects? How does tokenization transcend mere compression? What properties are desirable in a tokenization scheme?
How can we make canonicalization work, in theory and in practice, as an approach for enforcing symmetries in black-box models?
How much hot chocolate can I consume at a single research institution?
[Note: extensions of this work are in submission!] We propose a simple classifier test for detecting whether a distribution of point clouds is rotationally aligned, versus isotropically oriented. In essence, we split the dataset into two halves, rotate one half, and then check the test accuracy of a classifier trained to distinguish between them. In applying this test to point cloud datasets (QM9, OC20, MD17), we surprisingly find that they are extremely aligned! This has implications for our understanding of how, and when, equivariant methods (including augmentation and canonicalization) succeed.
Equivariant models can't break symmetries - they can only map symmetric inputs (e.g. squares) to symmetric outputs (e.g. objects with the same symmetry as a square). We propose a sample-efficient probabilistic framework for breaking symmetries, e.g. in generative models' latent spaces, by combining equivariant networks with canonicalization-based positional encodings.
We demonstrate that, perhaps surprisingly, there is no continuous canonicalization (or even efficiently implementable frame) for many symmetry groups. We introduce a notion of weighted frames to circumvent this issue.
We propose machine learning approaches, which are equivariant with respect to the non-compact group of area-preserving transformations SL(2,R), for learning to solve polynomial optimization problems.
We observe that many popular positional encodings (sinusoidal, ROPE, graph PEs, etc) can be interpreted as algebraic group representations, which formalizes some of their desirable properties (invariance to global translation, etc). This also suggests a simple framework for building positional encodings with new invariances, such as the special euclidean group.
We present a framework for automatically identifying and captioning coherent patterns of errors made by any trained model. The key? Keeping it simple: linear classifiers in a shared vision-language embedding space.
We define a family of distance pseudometrics for comparing learned data representations, directly inspired by transfer learning. In particular, we define a distance between two representations based on how differently (worst-case over all downstream, bounded linear predictive tasks) they perform under ridge regression.
We extend Barron’s Theorem for efficient approximation to invariant neural networks, in the cases of invariance to a permutation subgroup or the rotation group.
We prove that any nearly low-rank Toeplitz positive semidefinite matrix has a low-rank approximation that is itself Toeplitz, and give a sublinear query complexity algorithm for finding it.
We characterize the implicit bias of linear group-convolutional networks trained by gradient descent. In particular, we show that the learned linear function is biased towards low-rank matrices in Fourier space.
By using a maximum-likelihood objective coupled with a deep decoder prior for images, we achieve superior image reconstruction for holographic phase retrieval, including under several challenging realistic conditions. To our knowledge, this is the first dataset-free machine learning approach for holographic phase retrieval.
We establish the minimax regret of switching-constrained online convex optimization, a realistic optimization framework where algorithms must act in real-time to minimize cumulative loss, but are penalized if they are too erratic.
By building new, randomized "ruler" sampling constructions, we show how to use sublinear sparse Fourier transform algorithms for sample efficient, low-rank, Toeplitz covariance estimation.
Service
Organizer, Boston Symmetry Day, Fall 2023 - Present
Teaching Assistant, 6.S966 Symmetry and its Applications to Machine Learning, Spring 2023