Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10822 publications
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Preview abstract
Styled Handwritten Text Generation (HTG) has recently received attention from the computer vision and document analysis communities, which have developed several solutions, either GAN- or diffusion-based, that achieved promising results. Nonetheless, these strategies fail to generalize to novel styles and have technical constraints, particularly in terms of maximum output length and training efficiency. To overcome these limitations, in this work, we propose a novel framework for text image generation, dubbed Emuru. Our approach leverages a powerful text image representation model (a variational autoencoder) combined with an autoregressive Transformer. Our approach enables the generation of styled text images conditioned on textual content and style examples, such as specific fonts or handwriting styles. We train our model solely on a diverse, synthetic dataset of English text rendered in over 100,000 typewritten and calligraphy fonts, which gives it the capability to reproduce unseen styles (both fonts and users' handwriting) in zero-shot. To the best of our knowledge, Emuru is the first autoregressive model for HTG, and the first designed specifically for generalization to novel styles. Moreover, our model generates images without background artifacts, which are easier to use for downstream applications. Extensive evaluation on both typewritten and handwritten, any-length text image generation scenarios demonstrates the effectiveness of our approach.
View details
Optimization by Decoded Quantum Interferometry
Stephen Jordan
Mary Wootters
Alexander Schmidhuber
Robbie King
Sergei Isakov
Nature, 646 (2025), pp. 831-836
Preview abstract
Achieving superpolynomial speedups for optimization has long been a central goal for quantum algorithms. Here we introduce Decoded Quantum Interferometry (DQI), a quantum algorithm that uses the quantum Fourier transform to reduce optimization problems to decoding problems. For approximating optimal polynomial fits over finite fields, DQI achieves a superpolynomial speedup over known classical algorithms. The speedup arises because the problem's algebraic structure is reflected in the decoding problem, which can be solved efficiently. We then investigate whether this approach can achieve speedup for optimization problems that lack algebraic structure but have sparse clauses. These problems reduce to decoding LDPC codes, for which powerful decoders are known. To test this, we construct a max-XORSAT instance where DQI finds an approximate optimum significantly faster than general-purpose classical heuristics, such as simulated annealing. While a tailored classical solver can outperform DQI on this instance, our results establish that combining quantum Fourier transforms with powerful decoding primitives provides a promising new path toward quantum speedups for hard optimization problems.
View details
Tighter Privacy Analysis for Truncated Poisson Sampling
Arun Ganesh
(2025)
Preview abstract
We give a new privacy amplification analysis for truncated Poisson sampling, a Poisson sampling variant that truncates a batch if it exceeds a given maximum batch size.
View details
Preview abstract
Computer use agents (CUAs) need to plan long-horizon task workflows grounded in diverse, ever-changing applications and environments, but learning is hindered by the scarcity of large-scale, high-quality training data. Existing datasets are small, domain-specific, and costly to annotate, while current synthetic data generation methods often yield brittle, simplistic, or misaligned task demonstrations.
We introduce Watch & Learn (W&L), a framework that transforms human demonstration videos available in the Internet into executable UI trajectories at scale. Inspired by robotics, we train an inverse dynamics model that accurately predicts user actions from consecutive screens, bypassing the need for complex heuristics. To scale to the web, we curate a large state-transition corpus and design a retrieval framework that identifies relevant video tutorials, enabling automatic conversion of raw videos into structured UI trajectories without requiring manual annotations. Beyond training data, we show that the generated UI trajectories can also serve as in-context exemplars, providing CUAs with long-horizon priors and domain-specific knowledge at inference time.
On the challenging OSWorld and Mind2Web benchmarks, UI trajectories extracted with W&L consistently improve both general-purpose and state-of-the-art frameworks when used in-context, and delivers stronger gains for open-source models when used in training. These results highlight web-scale human demonstration videos as a practical and scalable foundation for advancing CUAs towards real-world deployment.
View details
GovSCH: An Open-Source Schema for Authoring Cybersecurity and AI Governance Documents
New America, New America (2025)
Preview abstract
The increasing complexity of cybersecurity and artificial intelligence (AI) executive orders, frameworks, and policies has made translating high-level directives into actionable implementation a persistent challenge. Policymakers, framework authors, and engineering teams often lack a unified approach for interpreting and operationalizing these documents, resulting in inefficiencies, misalignment, and delayed compliance. While existing standards such as the Open Security Controls Assessment Language (OSCAL) address control-level specifications, no standardized, machine-readable format exists for authoring and structuring high-level governance documents. This gap hinders collaboration across disciplines and obscures critical directives’ underlying intent and rationale.
This report introduces Governance Schema (GovSCH), an open-source schema designed to standardize the authoring and translation of cybersecurity and AI governance documents into a consistent, machine-readable format. By analyzing prior executive orders, regulatory frameworks, and policies, GovSCH identifies common structures and authoring practices to create an interoperable model that bridges policymakers, regulatory framework authors, and engineering teams. This approach enables more precise articulation of policy intent, improves transparency, and accelerates the technical implementation of regulations. Ultimately, GovSCH aims to enhance collaboration, standardization, and efficiency in cybersecurity and AI governance.
View details
Safe Coding: Rigorous Modular Reasoning about Software Safety
Queue, 23(5) (2025)
Preview abstract
Many persistent and dangerous software vulnerabilities, including memory safety violations and code injection, arise from a common root cause: Developers inadvertently violate the implicit safety preconditions of widely-used programming constructs. These preconditions—such as pointer validity, array-access bounds, and the trustworthy provenance of code fragments to be evaluated as SQL, HTML, or JavaScript—are traditionally the developer's responsibility to ensure. In complex systems, meeting these obligations often relies on non-local, whole-program invariants that are notoriously difficult to reason about correctly, leading to vulnerabilities that are difficult to detect after the fact.
This article introduces Safe Coding, a collection of software design patterns and practices designed to cost-effectively provide a high degree of assurance against entire classes of such vulnerabilities. The core principle of Safe Coding is to shift responsibility for safety from individual developers to the programming language, software libraries, and frameworks. This is achieved by systematically eliminating the direct use of risky operations—those with complex safety preconditions—in application code. Instead, these operations are encapsulated within safe abstractions: modules with public APIs that are safe by design, whose implementations fully ensure all module-internal safety preconditions through a combination of local runtime checks and by elevating safety preconditions into type invariants.
Safe Coding facilitates a modular and compositional approach to whole-program safety: Difficult reasoning is localized to the implementation of safe abstractions, which undergo focused expert scrutiny. The composition of these abstractions with the majority of the codebase (which is kept free of risky operations) is then automatically verified by the language’s type checker. This form of compositional reasoning, drawing from patterns used in formal software verification, can be viewed as a semi-formal approach that balances rigor with broad applicability to large industrial codebases. We discuss the successful application of these practices at Google, where they have nearly eliminated vulnerabilities such as Cross-Site Scripting (XSS) and SQL injection, and their critical role in ensuring memory safety in Rust, collectively demonstrating a favorable cost-assurance tradeoff for achieving software safety at scale.
View details
A Fine-grained Characterization of PAC Learnability
Marco Bressan
Nataly Brukhim
Nicolo Cesa-Bianchi
Emmanuel Esposito
Shay Moran
Maximilian Thiessen
COLT (2025)
Preview abstract
In the multiclass PAC setting, even when full learnability is unattainable, meaningful information can often be extracted to guide predictions. However, classical learning theory has mainly focused on the dichotomy ``learnable vs.\ non-learnable'', leaving notions of partial learnability largely unexplored. Indeed, even for a non-learnable class, a learner may still achieve partial success—for example, by making reliable predictions whenever the true label belongs to a fixed subset of the label space, even if it fails otherwise. Similarly, the rigid nature of PAC learnability makes it impossible to distinguish between classes where one can achieve favorable trade-offs between, say, false-positive and false-negative rates, and classes where such trade-offs are fundamentally unattainable. In a nutshell, standard PAC learnability precludes a fine-grained exploration of learnability.
To overcome this limitation, we develop a fine-grained theory of PAC learnability. For any hypothesis class \(\mathcal{H}\), given a loss function (which quantifies the penalty for predicting \(\hat{y}\) instead of the true label \(y\)) and a target loss threshold \(z\), our theory determines whether it is possible to achieve a loss of at most \(z\). In contrast, classical PAC learning considers only the special case of the zero-one loss and \(z = 0\), corresponding to a near perfect classification guarantee.
We give a complete characterization of all attainable guarantees,
captured by a \emph{finite family} of combinatorial dimensions, which we term the \emph{\(J\)-cube dimensions} of \(\mathcal{H}\). These dimensions are defined for every subset \(J\) of at least two labels.
This extends the fundamental theorem of realizable PAC learning based on the VC dimension. In fact, our results hold in a more general multi-objective setting where we fully characterize the Pareto frontier of guarantees attainable for the class $\H$.
View details
Circadian rhythm of heart rate and activity: a cross-sectional study
Maryam Khalid
Logan Schneider
Aravind Natarajan
Conor Heneghan
Karla Gleichauf
Chronobiology International (2025)
Preview abstract
ABSTRACT
Background: Circadian rhythms are commonly observed in a number of physiological processes. Consumer wearable devices have made it possible to obtain continuous time series data from a large number of individuals. We study circadian rhythms from measurements of heart rate, movement, and sleep, from a cohort of nearly 20,000 participants over the course of 30 days.
Methods: Participation was restricted to Fitbit users of age 21 years or older residing in the United States or Canada. Participants were enrolled through a recruitment banner shown on the Fitbit App. The advertisement was shown to 531,359 Fitbit users, and 23,239 enrolled in the program. Of these, we obtained heart rate data from 19,350 participants. We obtain the underlying circadian rhythm from time series heart rate by modeling the circadian rhythm as a sum over the first two Fourier harmonics. The first Fourier harmonic accounts for the 24-hour rhythmicity, while the second harmonic accounts for non-sinusoidal perturbations.
Findings: We observe a circadian rhythm in both heart rate and acceleration. From the diurnal modulation, we obtain the following circadian parameters: (i) amplitude of modulation, (ii) bathyphase, (iii) acrophase, (iv) non-sinusoidal fraction, and (v) fraction of day when the heart rate is greater than the mean. The amplitude, bathyphase, and acrophase depend on sex, and decrease with age. The waketime on average, follows the bathyphase by 2.4 hours. In most individuals, the circadian rhythm of heart rate lags the circadian rhythm of activity.
Interpretation: Circadian metrics for heart rate and activity can be reliably obtained from commercially available wearable devices. Distributions of circadian metrics can be valuable tools for individual-level interpretation.
View details
XR Blocks: Accelerating Human-Centered AI + XR Innovation
Nels Numan
Evgenii Alekseev
Alex Cooper
Min Xia
Scott Chung
Jeremy Nelson
Xiuxiu Yuan
Jolica Dias
Tim Bettridge
Benjamin Hersh
Michelle Huynh
Konrad Piascik
Ricardo Cabello
Google, XR, XR Labs (2025)
Preview abstract
We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like PyTorch and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation. XR Blocks provides a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "minimum code from idea to reality", accelerating rapid prototyping of complex AI + XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive prototype.
View details
Quasiparticle-induced decoherence of a driven superconducting qubit
Mykola Kishmar
Pavel Kurilovich
Vlad Kurilovich
Thomas Connolly
Andrey Klots
Igor Aleiner
arXiv (2025)
Preview abstract
We develop a theory for two quasiparticle-induced decoherence mechanisms of a driven superconducting qubit. In the first mechanism, an existing quasiparticle (QP) tunnels across the qubit’s Josephson junction while simultaneously absorbing a qubit excitation and one (or several) photons from the drive. In the second mechanism, a qubit transition occurs during the non-linear absorption process converting multiple drive quanta into a pair of new QPs. Both mechanisms can remain significant in gap engineered qubits whose coherence is insensitive to QPs without the drive. Our theory establishes a fundamental limitation on fidelity of the microwave qubit operations—such as readout and gates—stemming from QPs.
View details
CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments
Jose Estevez
Shankey Poddar
Aviral Suri
Lorenzo Gatto
Zijun Kan
Diksha Bansal
Bill Cheung
2025
Preview abstract
The proliferation of digital payment platforms has transformed commerce, offering unmatched convenience and accessibility globally. However, this growth has also attracted malicious actors, leading to a corresponding increase in sophisticated social engineering scams. These scams are often initiated and orchestrated on multiple surfaces outside the payment platform, making user and transaction-based signals insufficient for a complete understanding of the scam's methodology and underlying patterns, without which it is very difficult to prevent it in a timely manner. This paper presents CASE (Conversational Agent for Scam Elucidation), a novel Agentic AI framework that addresses this problem by collecting and managing user scam feedback in a safe and scalable manner. A conversational agent is uniquely designed to proactively interview potential victims to elicit intelligence in the form of a detailed conversation. The conversation transcripts are then consumed by another AI system that extracts information and converts it into structured data for downstream usage in automated and manual enforcement mechanisms. Using Google's Gemini family of LLMs, we implemented this framework on Google Pay (GPay) India. By augmenting our existing features with this new intelligence, we have observed a 21% uplift in the volume of scam enforcements. The architecture and its robust evaluation framework are highly generalizable, offering a blueprint for building similar AI-driven systems to collect and manage scam intelligence in other sensitive domains.
View details
Preview abstract
Visual in-context learning (VICL), as a new paradigm in computer vision, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. While effective, the existing VICL paradigm exhibits poor generalizability under distribution shifts. In this work, we propose test-time visual in-context tuning (VICT), a method that can learn adaptive VICL models on the fly with a single test sample. Specifically, We flip the role between task prompts and the test sample and use a cycle consistency loss to reconstruct the original task prompt output. Our key insight is that a model should be aware of a new test distribution if it can successfully recover the original task prompts. Extensive experiments on seven representative vision tasks with 15 corruptions demonstrate that our VICT can improve the generalizability of VICL to unseen new domains
View details
Faster electronic structure quantum simulation by spectrum amplification
Guang Hao Low
Robbie King
Dominic Berry
Qiushi Han
Albert Eugene DePrince III
Alec White
Rolando Somma
Physical Review X, 15 (2025), pp. 041016
Preview abstract
The most advanced techniques using fault-tolerant quantum computers to estimate the ground-state energy of a chemical Hamiltonian involve compression of the Coulomb operator through tensor factorizations, enabling efficient block encodings of the Hamiltonian. A natural challenge of these methods is the degree to which block-encoding costs can be reduced. We address this challenge through the technique of spectral amplification, which magnifies the spectrum of the low-energy states of Hamiltonians that can be expressed as sums of squares. Spectral amplification enables estimating ground-state energies with significantly improved cost scaling in the block encoding normalization factor Λ to just √2Λ𝐸gap, where 𝐸gap ≪Λ is the lowest energy of the sum-of-squares Hamiltonian. To achieve this, we show that sum-of-squares representations of the electronic structure Hamiltonian are efficiently computable by a family of classical simulation techniques that approximate the ground-state energy from below. In order to further optimize, we also develop a novel factorization that provides a trade-off between the two leading Coulomb integral factorization schemes—namely, double factorization and tensor hypercontraction—that when combined with spectral amplification yields a factor of 4 to 195 speedup over the state of the art in ground-state energy estimation for models of iron-sulfur complexes and a CO2-fixation catalyst.
View details