Upcoming Seminar Presentations
All seminars are on Tuesdays [ 8:30 am PT ] = [ 11:30 am ET ] = [ 4:30 pm London ] = [ 5:30 pm Paris ] = [ 11:30 pm Beijing ]
Subscribe to our mailing list and calendar for up-to-date schedule!
Tuesday, March 31, 2026
Speaker: Ye He (Georgia Tech) [Zoom Link]
Title: Diffusion Model’s Generalization via Data-Dependent Ridge Manifolds
Abstract: When a diffusion model is not memorizing the training samples, what does it generate, and why? In this talk, I will describe a quantitative framework for understanding the distribution produced by a learned diffusion model through a data-driven geometric object: a log-density ridge manifold of the smoothed training distribution. This manifold acts as a backbone for generation and reveals a three-stage inference behavior: trajectories first reach the ridge, then align in normal directions, and finally slide along tangent directions. This perspective allows us to quantify how training error influences generation in different directions, and to explain when inter-mode generations arise. I will also present a random feature example in which the model’s inductive bias can be decomposed explicitly into architectural bias and optimization error, and tracked along the inference dynamics. Experiments on synthetic multimodal distributions and MNIST latent diffusion support the theory in both low- and high-dimensional settings.
Tuesday, April 7, 2026
Speaker: Yifan Chen (UCLA) [Zoom Link]
Title: Affine Invariant Samplers and Flows: Analysis and New Algorithms
Abstract: The Goodman–Weare affine invariant ensemble sampler is widely used for sampling from complex probability distributions, owing to its simplicity and robustness to ill-conditioning, and is popularized by the “emcee" package. In this talk, I will first characterize its scaling limit as an affine-invariant gradient flow on the space of probability measures. Building on this perspective, I will introduce a family of affine-invariant gradient and Hamiltonian flows that give rise to unbiased ensemble samplers with provably improved high-dimensional scaling compared to Goodman–Weare. For settings demanding further scalability, I will also discuss approximate samplers based on variational inference, driven by affine-invariant gradient flows over Gaussian and mixture families. Theoretical guarantees and empirical results demonstrate robustness to both dimension and condition number, suggesting a broadly applicable affine-invariant framework and promising generic toolkit for sampling in high-dimensional, ill-conditioned problems.
Tuesday, April 14, 2026
Speaker: Bohan Zhou (UCSB) [Zoom Link]
Title: Accelerating MCMC on discrete-state space.
Abstract: Recent years have seen growing interest in the deep connections between optimization and sampling. In particular, Langevin dynamics for sampling can be interpreted as the gradient flow of the relative entropy in the space of probability distributions. A natural question is whether such connections can be extended to discrete state spaces. Building on the new interpretation of MCMC as a gradient flow with respect to the graphical Wasserstein metric, we propose a class of Nesterov-type algorithms to accelerate MCMC sampling on graphs. The corresponding continuous-time formulation can be viewed as a damped Hamiltonian flow in probability space. We establish theoretical results on convergence and acceleration for some user-specified setting, and present numerical examples demonstrating improved accuracy and convergence speed of sampling on multimodal distributions and real datasets.
Tuesday, April 21, 2026
Speaker: Noah Golowich (UT Austin) [Zoom Link]
Title: Understanding Parallel Reasoning in Language Model Inference
Abstract: Efficiently sampling from a complex probability distribution is a fundamental problem across machine learning and theoretical computer science. It has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from large language models (LLMs) have been proposed to solve challenging reasoning problems spanning domains such as mathematics and coding. For the most part, however, we lack a principled understanding of the accuracy--cost tradeoffs for such procedures. In this talk, we propose a formalization for such tasks as the problem of producing a sample from a target probability measure, given an oracle which yields approximate density estimates for the target measure. Depending on the context, this oracle may be interpreted as an approximate verifier or a *process reward model* for a particular language modeling task. This setup is closely related to the problem of reducing sampling to approximate counting studied in seminal works of Jerrum, Valiant & Vazirani (1986) and Jerrum & Sinclair (1989).
Generalizing results from existing literature, we establish provable guarantees for the Sequential Monte Carlo algorithm and related particle filtering approaches, which have recently found success empirically in the context of both language modeling and diffusion. In particular, our theory identifies a few properties of the oracle which suffice for efficient sampling. We conduct experiments to show that these properties indeed correlate with sampling performance for certain language modeling tasks.
The efficacy of such sampling algorithms, however, is limited by the relationship between the underlying LLM and the particular sampling task at hand, which has motivated the framework of Test-Time Training (TTT). In particular, TTT updates a model's weights in response to partial generations and reward feedback received at inference time. In the latter half of the talk, we will discuss some provable benefits of TTT in the context of our sampling framework.
Based on https://arxiv.org/pdf/2603.07887 (joint work with Fan Chen, Dhruv Rohatgi, Raghav Singhal, Carles Domingo-Enrich, Dylan J. Foster, and Akshay Krishnamurthy); and upcoming joint work with Ankur Moitra and Dhruv Rohatgi.