We provide algorithmic versions of the Polynomial Freiman-Ruzsa theorem of Gowers, Green, Manners, and Tao (Ann. of Math., 2025). In particular, we give a polynomial-time algorithm that, given a set $A \subseteq \mathbb{F}_2^n$ with doubling constant $K$, returns a subspace $V \subseteq \mathbb{F}_2^n$ of size $|V| \leq |A|$ such that $A$ can be covered by $2K^C$ translates of $V$, for a universal constant $C>1$. We also provide efficient algorithms for several "equivalent" formulations of the Polynomial Freiman-Ruzsa theorem, such as the polynomial Gowers inverse theorem, the classification of approximate Freiman homomorphisms, and quadratic structure-vs-randomness decompositions. Our algorithmic framework is based on a new and optimal version of the Quadratic Goldreich-Levin algorithm, which we obtain using ideas from quantum learning theory. This framework fundamentally relies on a connection between quadratic Fourier analysis and symplectic geometry, first speculated by Green and Tao (Proc. of Edinb. Math. Soc., 2008) and which we make explicit in this paper.
This thesis develops a decision-theoretic framework for extracting thermodynamic work from temporal correlations in quantum systems. We model a classical agent -- lacking quantum memory -- performing adaptive work extraction through continuous inference and decision-making under uncertainty. By introducing $\rho^*$-ideal protocols, we demonstrate that exploiting memory effects allows adaptive strategies to surpass non-adaptive bounds. We formalize this via the Time-Ordered Free Energy (TOFE), a novel upper bound for causal, adaptive operations that reveals a thermodynamic gap linked to adaptive ordered discord. Additionally, we tackle work extraction from unknown sources using reinforcement learning. By adapting multi-armed bandit algorithms, we show an agent can simultaneously learn an unknown i.i.d. quantum state and extract work, achieving polylogarithmic cumulative dissipation that significantly outperforms standard tomography. Overall, this work lays the foundation for predictive and learning-based quantum thermodynamics.
Assembling large-scale, defect-free Rydberg atom arrays is a key technology for neutral-atom quantum computation. Dynamic holographic optical tweezers enable the assembly and reconfiguration of such arrays, but phase mismatches between successive holograms can induce destructive interference and transient trap loss during spatial-light-modulator refresh. In this work, we introduce the weighted-projective Gerchberg--Saxton (WPGS) algorithm, a phase-stable approach to dynamic hologram updates for large-scale Rydberg atom-array reconfiguration. By enforcing inter-frame trap-phase continuity while retaining weighted intensity equalization, WPGS suppresses refresh-induced transient degradation. The phase-difference distribution between consecutive holograms further provides a simple diagnostic of transient robustness. Moreover, enforcing the phase constraint reduces the number of iterations required at each update step, thereby accelerating hologram generation. Numerical simulations of 2D and 3D reconfiguration with more than $10^3$ traps, including multilayer assembly and interlayer transport, show robust transient intensities and significantly faster updates than conventional methods. These results establish inter-frame phase continuity as a practical design principle for dynamic holographic control and scalable neutral-atom array reconfiguration.
Literature provides several bounds for quantum local recovery, which essentially consider the number of message qudits, the distance, the length, and the locality of the involved codes. We give a family of $J$-affine variety codes that result in impure CSS codes. These quantum codes exceed several of the above mentioned bounds that apply to pure quantum locally recoverable codes. We also discuss a connection between bounds on quantum local recovery and on weight-constrained stabilizer codes.
Dynamic quantum circuits with mid-circuit measurements (MCMs) and feed-forward operations play a crucial role in various applications, such as quantum error correction and quantum algorithms. With advancements in quantum hardware enabling the implementation of MCM and feed-forward loops, the use of dynamic circuits has become increasingly prevalent. There is a significant need for a benchmarking framework specially designed for dynamic circuits to capture their unique properties, as current benchmarking tools are designed primarily for unitary circuits and cannot be trivially extended to dynamic circuits. We propose dynamarq, a scalable and hardware-agnostic benchmarking framework for dynamic circuits. We collect a set of dynamic circuit benchmarks spanning various applications and propose a broad set of circuit features to characterize the structure of these dynamic circuits. We run them on two IBM quantum processors and the Quantinuum Helios-1E emulator, and propose scalable, application-dependent fidelity scores for each benchmark based on hardware execution results. We perform statistical modeling to identify correlations between circuit features and fidelity scores, and demonstrate highly accurate fidelity prediction using our model. Our model parameters are also transferable across hardware backends and calibration cycles. Our framework facilitates the understanding of dynamic circuit structures and provides insights for designing and optimizing dynamic circuits to achieve high execution fidelity on quantum hardware.
Recently, Yamaguchi and Kempf [Phys. Rev. Lett. 136:010801, arXiv:2501.02757] proved that encrypted qubits can be cloned. In this work, we generalize the encrypted cloning protocol and prove that it also applies to higher-order quantum systems. Given that a straightforward generalization of the protocol using the exponential of the shift and phase operators fails to satisfy the unitary requirement for a quantum gate, we propose a different approach. We introduce a new operator to be used in the encryption process and show that it is unitary. We adapt the decryption operator from the reference paper to fit in the framework of multi-level quantum systems. We analyze the circuit implementation of the proposed operators and show that the overhead imposed by larger dimensions scales linearly with qudit dimension.
A common view in monitored quantum dynamics is that local measurements suppress entanglement growth. We show that this intuition can fail in a one-dimensional spinful fermionic chain governed by a BCS Hamiltonian with pairing strength $\Delta$ and subject to continuous, on-site, spin-resolved charge measurements at rate $\gamma$. Using free-fermion simulations and quasiparticle analysis, we show that pairing suppresses entanglement growth, while measurements suppress pairing. Their competition yields measurement-enhanced entanglement: for $\Delta>0$, the steady-state entanglement $S_s$ increases with $\gamma$ over a finite interval $0<\gamma<\gamma_{\rm peak}$. This occurs because stronger measurements suppress pairing correlations, which would otherwise suppress entanglement growth. Using a nonlinear sigma-model calculation and free-fermion simulations, we provide evidence that for $\Delta>0$ and small but finite $\gamma$, the steady-state entanglement scales as $S_s\sim \ln^2 L$. This implies that, in this setting, measurement-enhanced entanglement does not persist in the thermodynamic limit.
We present a systematic numerical investigation of the "entanglement geometry gravity" chain in random tensor networks (RTN) established by the ER EPR conjecture and Jacobson's thermodynamic derivation. First, we verify the kinematic foundation: the entanglement first law $\delta\langle K\rangle=\delta S$ (slope=1.000), the encoding of geometry by mutual information (correlation=0.92), and the locality of holographic perturbations (3.3x). We also confirm that gravitational dynamics (JT gravity) does not emerge, identifying a sharp kinematics-dynamics boundary. Second, and more importantly, we discover that many-body localization (MBL) is the mechanism that protects emergent holographic geometry from thermalization. Replacing Haar-random evolution (geometry lifetime $t\sim6$) with an XXZ Hamiltonian plus on-site disorder, we observe a finite-size crossover at disorder strength $W_c\approx10-12$ above which mutual-information-lattice correlations persist indefinitely ($r>0.5$ for $t>50$). We map the full parameter space: the optimal regime is a near-Ising anisotropy $\Delta\approx50$ with $W=30$ yielding $r=0.779\pm0.002$ (confirmed by a fine scan over $\Delta\in[30,70]$); only holographic (RTN) initial states sustain geometry, while product, Néel, and Bell-pair states do not. MBL preserves the spatial structure of entanglement (adjacent/non-adjacent MI ratio ~2.6-4.2x vs. 1.0x in the thermal phase), rather than its total amount. A comparison with classical cellular automata reveals that MBL uniquely breaks the entanglement-structure trade-off imposed by quantum monogamy: classical systems achieve spatial structure only at the cost of negligible mutual information, while MBL sustains both.
Contextuality and measurement incompatibility are two fundamental aspects of nonclassicality, and their manifestations in observed quantum correlations are often deeply interconnected. Recently, measurement incompatibility has been studied in connection with nonlocality, particularly in terms of their robustness under various quantum channels. This line of investigation helps establish a connection between the channels that break nonlocality and those that break incompatibility. In this study, we focus on an asymmetric bipartite Bell scenario involving three and four inputs on Alice and Bob sides, respectively, with each of these inputs having dichotomous outcomes. Under the assumption of locality, the observed statistics in this asymmetric scenario obeys the Elegant Bell inequality (EBI). Here, we use a different version of the EBI that relies on the assumption of the preparation noncontextuality. By taking the violation of this noncontextual version of EBI as a witness of preparation contextuality we establish a connection between the channels that break contextuality and the channels that break triple-wise measurement incompatibility. Our results suggest that any channel which breaks EBI contextuality will also break Clauser-Horne-Shimony-Holt (CHSH) nonlocality; however, the reverse does not hold. We also show that a depolarising channel that breaks N-wise incompatibility can also break a certain form of contextuality, witnessed by a generalised inequality involving N measurements on one wing of a bipartite Bell scenario.
We show an example of a function and a collision kernel for which the entropy production increases in time when we flow it by the space-homogeneous Boltzmann equation. The collision kernel is not any of the physically motivated kernels that are commonly used in the literature. In this particular setting, our result disproves a conjecture of McKean from 1966.
Apr 07 2026
cs.DS arXiv:2604.04752v1
We show that every directed graph $G$ with $n$ vertices and $m$ edges admits a directed acyclic graph (DAG) with $m^{1+o(1)}$ edges, called a DAG projection, that can either $(1+1/\text{polylog} (n))$-approximate distances between all pairs of vertices $(s,t)$ in $G$, or $n^{o(1)}$-approximate maximum flow between all pairs of vertex subsets $(S,T)$ in $G$. Previous similar results suffer a $\Omega(\log n)$ approximation factor for distances [Assadi, Hoppenworth, Wein, STOC'25] [Filtser, SODA'26] and, for maximum flow, no prior result of this type is known. Our DAG projections admit $m^{1+o(1)}$-time constructions. Further, they admit almost-optimal parallel constructions, i.e., algorithms with $m^{1+o(1)}$ work and $m^{o(1)}$ depth, assuming the ones for approximate shortest path or maximum flow on DAGs, even when the input $G$ is not a DAG. DAG projections immediately transfer results on DAGs, usually simpler and more efficient, to directed graphs. As examples, we improve the state-of-the-art of $(1+\epsilon)$-approximate distance preservers [Hoppenworth, Xu, Xu, SODA'25] and single-source minimum cut [Cheung, Lau, Leung, SICOMP'13], and obtain simpler construction of $(n^{1/3},\epsilon)$-hop-set [Kogan, Parter, SODA'22] [Bernstein, Wein, SODA'23] and combinatorial max flow algorithms [Bernstein, Blikstad, Saranurak, Tu, FOCS'24] [Bernstein, Blikstad, Li, Saranurak, Tu, FOCS'25]. Finally, via DAG projections, we reduce major open problems on almost-optimal parallel algorithms for exact single-source shortest paths (SSSP) and maximum flow to easier settings: (1) From exact directed SSSP to exact undirected ones, (2) From exact directed SSSP to $(1+1/\text{polylog}(n))$-approximation on DAGs, and (3) From exact directed maximum flow to $n^{o(1)}$-approximation on DAGs.
The phenomenon of interaction-free measurement (IFM) enables the probabilistic detection of an absorbing object with reduced photon absorption. We report the experimental implementation of a simultaneous IFM of multiple objects using a single quantum probe on the cloud-based Ascella photonic processor of company Quandela. We demonstrate sequential IFM of up to 5 objects using a single photon, significantly extending the original IFM scheme for a single object. The experimental error-mitigated results confirm the theoretical predictions for this sequential IFM setup, and demonstrate a practical approach to scaling IFM to more complex quantum interrogation tasks.
We study the \emphSubset Balancing problem: given $\mathbf{x} \in \mathbb{Z}^n$ and a coefficient set $C \subseteq \mathbb{Z}$, find a nonzero vector $\mathbf{c} \in C^n$ such that $\mathbf{c}\cdot\mathbf{x} = 0$. The standard meet-in-the-middle algorithm runs in time $\tilde{O}(|C|^{n/2})=\tilde{O}(2^{n\log |C|/2})$, and recent improvements (SODA~2022, Chen, Jin, Randolph, and Servedio; STOC~2026, Randolph and Węgrzycki) beyond this barrier apply mainly when $d$ is constant. We give a reduction from Subset Balancing with $C = \{-d, \dots, d\}$ to a single instance of $\mathrm{SVP}_{\infty}$ in dimension $n+1$, which yields a deterministic algorithm with running time $\tilde{O}((6\sqrt{2\pi e})^n) \approx \tilde{O}(2^{4.632n})$, and a randomized algorithm with running time $\tilde{O}(2^{2.443n})$ (here $\tilde{O}$ suppresses $\operatorname{poly}(n)$ factors). We also show that for sufficiently large $d$, Subset Balancing is solvable in polynomial time. More generally, we extend the box constraint $[-d,d]^n$ to an arbitrary centrally symmetric convex body $K \subseteq \mathbb{R}^n$ with a deterministic $\tilde{O}(2^{c_K n})$-time algorithm, where $c_K$ depends only on the shape of $K$. We further study the \emphGeneralized Subset Sum problem of finding $\mathbf{c} \in C^n$ such that $\mathbf{c} \cdot \mathbf{x} = \tau$. For $C = \{-d, \dots, d\}$, we reduce the worst-case problem to a single instance of $\mathrm{CVP}_{\infty}$. Although no general single exponential time algorithm is known for exact $\mathrm{CVP}_{\infty}$, we show that in the average-case setting, for both $C = \{-d, \dots, d\}$ and $C = \{-d, \dots, d\} \setminus \{0\}$, the embedded instance satisfies a bounded-distance promise with high probability. This yields a deterministic algorithm running in time $\tilde{O}((18\sqrt{2\pi e})^n) \approx \tilde{O}(2^{6.217n})$.
Apr 07 2026
cs.DS arXiv:2604.04466v1
We consider graph property testing in $p$-degenerate graphs under the random neighbor oracle model (Czumaj and Sohler, FOCS 2019). In this framework, a tester explores a graph by sampling uniform neighbors of vertices, and a property is testable with one-sided error if its query complexity is independent of the graph size. It is known that one-sided error testable properties for minor-closed families are exactly those that can be defined by forbidden subgraphs of bounded size. However, the much broader class of $p$-degenerate graphs allows for high-degree ``hubs" that can structurally hide forbidden subgraphs from local exploration. In this work, we provide a complete structural characterization of all properties testable with one-sided error in $p$-degenerate graphs. We show that testability is fundamentally determined by the connectivity of the forbidden structures: a property is testable if and only if its violations cannot be fragmented across disjoint high-degree neighborhoods. Our results define the exact structural boundary for testability under these constraints, accounting for both the connectivity of individual forbidden subgraphs and the collective behavior of the properties they define.
Characterizing quantum states is essential for validating quantum devices, yet conventional quantum state tomography becomes prohibitively expensive as system size grows. Direct tomography offers a distinct route by enabling selective access to individual complex density-matrix elements, with a particular advantage for sparse target states and some verification tasks. Here we introduce a direct quantum state tomography scheme combining strong-measurement estimation with a fan-out coupling architecture. It enables mutually commuting interactions between system qubits and a single meter qubit, thereby achieving constant circuit depth, independent of system size. Notably, the involutory fan-out coupling reduces to the identity under repetition, enabling straightforward noise scaling for quantum error mitigation. We experimentally validate the scheme on a superconducting quantum processor via the IBM Quantum Platform, demonstrating four-qubit state reconstruction and single-circuit GHZ-state fidelity estimation up to 20 qubits with error mitigation. Consistent results with standard tomography and improved efficiency establish our scheme as a promising approach to reconstructing full quantum states and scalable verification tasks.
Quantum clock synchronization (QCS) aims to establish a shared temporal reference between distant nodes by exploiting uniquely quantum phenomena such as entanglement, single-photon interference, and quantum correlations. In contrast to classical synchronization and time-transfer techniques, which are limited by signal propagation delays, atmospheric disturbances, and oscillator drift, QCS protocols offer the potential to surpass classical precision bounds and enhance resilience against adversarial manipulations. As precise and secure time synchronization underpins distributed quantum networks, navigation systems, and emerging quantum Internet infrastructures, understanding QCS principles, capabilities, and implementation challenges has become increasingly important. This survey provides a unified and critical overview of the rapidly growing QCS research landscape, highlighting fundamentals, protocol types, enabling resources, performance constraints, security considerations, and practical implementations of QCS. We first introduce the theoretical underpinnings of QCS, including entanglement-assisted time transfer, Hong-Ou-Mandel interference-based synchronization, and quantum slow-clock transport. We then categorize the main QCS protocols, ranging from ticking-qubit and entanglement-based schemes to time-of-arrival correlation methods, conveyor-belt synchronization, and quantum-enhanced two-way time transfer. This organization clarifies the relationships between protocol families and their achievable precision advantages over classical methods. Key quantum resources such as spontaneous parametric down-conversion-based entangled photon pairs, Greenberger-Horne-Zeilinger and W multipartite states, squeezed and frequency-entangled light, quantum frequency combs, and quantum memories are reviewed in the context of scalability and robustness.
Apr 07 2026
cs.CC arXiv:2604.04188v1
In the noisy $k$-XOR problem, one is given $y \in \mathbb{F}_2^M$ and must distinguish between $y$ uniform and $y = A x + e$, where $A$ is the adjacency matrix of a $k$-left-regular bipartite graph with $N$ variables and $M$ constraints, $x\in \mathbb{F}_2^N$ is random, and $e$ is noise with rate $\eta$. Lower bounds in restricted computational models such as Sum-of-Squares and low-degree polynomials are closely tied to the expansion of $A$, leading to conjectures that expansion implies hardness. We show that such conjectures are false by constructing an explicit family of graphs with near-optimal expansion for which noisy $k$-XOR is solvable in polynomial time. Our construction combines two powerful directions of work in pseudorandomness and coding theory that have not been previously put together. Specifically, our graphs are based on the lossless expanders of Guruswami, Umans and Vadhan (JACM 2009). Our key insight is that by an appropriate interpretation of the vertices of their graphs, the noisy XOR problem turns into the problem of decoding Reed-Muller codes from random errors. Then we build on a powerful body of work from the 2010s correcting from large amounts of random errors. Putting these together yields our construction. Concretely, we obtain explicit families for which noisy $k$-XOR is polynomial-time solvable at constant noise rate $\eta = 1/3$ for graphs with $M = 2^{O(\log^2 N)}$, $k = (\log N)^{O(1)}$, and $(N^{1-\alpha}, 1-o(1))$-expansion. Under standard conjectures on Reed--Muller codes over the binary erasure channel, this extends to families with $M = N^{O(1)}$, $k=(\log N)^{O(1)}$, expansion $(N^{1-\alpha}, 1-o(1))$ and polynomial-time algorithms at noise rate $\eta = N^{-c}$.
We introduce the notion of dismagicker: non-Clifford unitary gate designed to reduce the non-stabilizerness (also called magic) of quantum many-body states. Although both entanglement and non-stabilizerness are fundamental quantum resources, they require distinct control strategies. While disentanglers (unitary operations that lower entanglement) are well-established in tensor network methods, analogous concept for non-stabilizerness suppression has been largely missing. In this work, we define dismagicker as non-Clifford unitary operation that actively suppresses non-stabilizerness, steering states toward classically simulatable stabilizer states. We develop optimization method for constructing dismagickers within the Matrix Product States framework. Our numerical results show that the non-stabilizerness reduction procedure, when combined with entanglement reduction steps with Clifford circuits, significantly improves the accuracy for both classical simulation of many-body systems and quantum state preparation on quantum devices. Dismagicker enriches our toolkit for the manipulation of many-body states by unifying non-stabilizerness and entanglement reduction.
We prove a homotopy invariance result for the first cohomology group of the special unitary group $\mathrm{SU}_3(F[t])$ with coefficients in irreducible representations of $\mathrm{PGL}_2(F)$. The main theorem establishes that this cohomology is naturally isomorphic to the corresponding cohomology of $\mathrm{PGL}_2(F)$.
Apr 07 2026
cs.DS arXiv:2604.03831v1
We study the Nearest Neighbor Search (NNS) problem in a high-dimensional setting where data lies in a low-dimensional subspace and is corrupted by Gaussian noise. Specifically, we consider a semi-random model in which $n$ points from an unknown $k$-dimensional subspace of $\mathbb{R}^d$ ($k \ll d$) are perturbed by zero-mean $d$-dimensional Gaussian noise with variance $\sigma^2$ per coordinate. Assuming the second-nearest neighbor is at least a factor $(1+\varepsilon)$ farther from the query than the nearest neighbor, and given only the noisy data, our goal is to recover the nearest neighbor in the uncorrupted data. We prove three results. First, for $\sigma \in O(1/k^{1/4})$, simply performing SVD denoises the data and provably recovers the correct nearest neighbor of the uncorrupted data. Second, for $\sigma \gg 1/k^{1/4}$, the nearest neighbor in the uncorrupted data is not even identifiable from the noisy data in general, giving a matching lower bound and showing the necessity of this threshold for NNS. Third, for $\sigma \gg 1/\sqrt{k}$, the noise magnitude $\sigma\sqrt d$ significantly exceeds inter-point distances in the unperturbed data, and the nearest neighbor in the noisy data generally differs from that in the uncorrupted data. Thus, the first and third results together imply that SVD can identify the correct nearest neighbor even in regimes where naive nearest neighbor search on the noisy data fails. Compared to \citepabdullah2014spectral, our result does not require $\sigma$ to be at least an inverse polynomial in the ambient dimension $d$. Our analysis uses perturbation bounds for singular spaces together with Gaussian concentration and spherical symmetry. We also provide empirical results on real datasets supporting our theory.
Classical simulation of quantum circuits remains indispensable for algorithm development, hardware validation, and error analysis in the noisy intermediate-scale quantum (NISQ) era. However, state-vector simulation faces exponential memory scaling, with an n-qubit system requiring O(2^n) complex amplitudes, and existing simulators often lack the flexibility to exploit heterogeneous computing resources at runtime. This paper presents a GPU-accelerated quantum circuit simulation framework that introduces three contributions: (1) an empirical backend selection algorithm that benchmarks CuPy, PyTorch-CUDA, and NumPy-CPU backends at runtime and selects the optimal execution path based on measured throughput; (2) a directed acyclic graph (DAG) based gate fusion engine that reduces circuit depth through automated identification of fusible gate sequences, coupled with adaptive precision switching between complex64 and complex128 representations; and (3) a memory-aware fallback mechanism that monitors GPU memory consumption and gracefully degrades to CPU execution when resources are exhausted. The framework integrates with Qiskit, Cirq, PennyLane, and Amazon Braket through a unified adapter layer. Benchmarks on an NVIDIA A100-SXM4 (40 GiB) GPU demonstrate speedups of 64x to 146x over NumPy CPU execution for state-vector simulation of circuits with 20 to 28 qubits, with speedups exceeding 5x from 16 qubits onward. Hardware validation on an IBM quantum processing unit (QPU) confirms Bell state fidelity of 0.939, a five-qubit Greenberger-Horne-Zeilinger (GHZ) state fidelity of 0.853, and circuit depth reduction from 42 to 14 gates through the fusion pipeline. The system is designed for portability across NVIDIA consumer and data-center GPUs, requiring no vendor-specific compilation steps.
Apr 07 2026
cs.CC arXiv:2604.03805v1
Alice and Bob are given $n$-bit integer pairs $(x,y)$ and $(a,b)$, respectively, and they must decide if $y=ax+b$. We prove that the randomised communication complexity of this Point--Line Incidence problem is $\Theta(\log n)$. This confirms a conjecture of Cheung, Hatami, Hosseini, and Shirley (CCC 2023) that the complexity is super-constant, and gives the first example of a communication problem with constant support-rank but super-constant randomised complexity.
The Sinkhorn--Knopp (SK) algorithm is a cornerstone method for matrix scaling and entropically regularized optimal transport (EOT). Despite its empirical efficiency, existing theoretical guarantees to achieve a target marginal accuracy $\varepsilon$ deteriorate severely in the presence of outliers, bottlenecked either by the global maximum regularized cost $\eta\|C\|_\infty$ (where $\eta$ is the regularization parameter and $C$ the cost matrix) or the matrix's minimum-to-maximum entry ratio $\nu$. This creates a fundamental disconnect between theory and practice. In this paper, we resolve this discrepancy. For EOT, we introduce the novel concept of well-boundedness, a local bulk mass property that rigorously isolates the well-behaved portion of the data from extreme outliers. We prove that governed by this fundamental notion, SK recovers the target transport plan for a problem of dimension $n$ in $O(\log n - \log \varepsilon)$ iterations, completely independent of the regularized cost $\eta\|C\|_\infty$. Furthermore, we show that a virtually cost-free pre-scaling step eliminates the dimensional dependence entirely, accelerating convergence to a strictly dimension-free $O(\log(1/\varepsilon))$ iterations. Beyond EOT, we establish a sharp phase transition for general $(\boldsymbol{u},\boldsymbol{v})$-scaling governed by a critical matrix density threshold. We prove that when a matrix's density exceeds this threshold, the iteration complexity is strictly independent of $\nu$. Conversely, when the density falls below this threshold, the dependence on $\nu$ becomes unavoidable; in this sub-critical regime, we construct instances where SK requires $\Omega(n/\varepsilon)$ iterations.
Apr 07 2026
hep-th arXiv:2604.03720v1
According to the correspondence principle of Horowitz and Polchinski, many black holes in string theory are continuously deformed to usual quantum systems involving D-branes and fundamental strings when the string coupling becomes sufficiently small. Therefore if we consider a configuration in space-time where the dilaton varies over an appropriate range, then a black hole moving in such a background will smoothly transition from the black hole state to a normal quantum state whose microstates are not hidden behind an event horizon. The possible obstruction to this mechanism comes from the fact that if the dilaton varies too fast then the adiabatic approximation may break down and / or the ambient space-time itself may collapse to a black hole and get hidden from the asymptotic observer. On the other hand, if the dilaton varies too slowly then the time that it takes for the black hole to travel the required distance will exceed the evaporation time of the black hole. We show that by choosing the background appropriately these obstructions can be avoided and a gentle motion towards the weak coupling region will convert the black hole into a normal quantum state without an event horizon.
A single photon in a superposition of $d$ modes naturally encode a $d$-dimensional quantum system, a so-called qudit. We show that such superpositions can be leveraged to achieve a quantum speed-up of remote remote state preparation (RSP): a primitive for several quantum network protocols. For a superposition over $d\geq 2$ modes, the photon state can encode up to ${\rm Log}_2(d)$ qubits, which we exploit in a proposed reflection based RSP protocol with multiple variations. For single qubit RSP, we achieve a performance comparable to the best known existing schemes but with reduced requirements for phase stabilization. For many qubit RSP the achievable success rates remain high despite needing exponentially many temporal modes, since only one photon needs to be transmitted and detected to prepare multiple qubits. By simultaneously preparing many qubits at once, we bypass limited qubit lifetimes limited qubit lifetimes and improve fidelities beyond what is achievable with existing RSP protocols.
High-gain spontaneous parametric down-conversion (SPDC) produces bright squeezed vacuum with rich high-dimensional entanglement, but its output is inherently multimodal and non-perturbative, making the full modal characterization a major computational bottleneck. We propose a physics-guided deep neural network that reconstructs the source's modal fingerprint: the high-dimensional correlation signature across radial and azimuthal indices. We designed a FiLM-modulated convolutional architecture that predicts the joint (m,l) distribution, and training is driven by a hybrid loss that couples data-driven metrics (JSD, KL, MSE, Wasserstein) with a soft orbital-angular-momentum (OAM) conservation term, providing an essential inductive bias toward physically consistent solutions. Across gain regimes, our method achieves high-fidelity reconstruction with average JSD of 1.96e-3, WEMD of 1.54e-3, and KL divergence of 7.85e-3, delivering an approximate 128-fold speedup over full numerical simulation and more than 30% accuracy gains over U-Net baselines. These results demonstrate that physics-guided learning, via a soft OAM-conservation regularizer and physically generated training targets, enables rapid and data-efficient modal characterization. Compared with traditional numerical simulation, our mesh-free method has demonstrated good generalization with limited or contaminated training data and has enabled fast "online" prediction of the quantum dynamics of a high-dimensional entanglement system for real-world experimental implementation.
Apr 07 2026
math.NT arXiv:2604.03327v1
In this note, we evaluate a series for $1/\pi$ conjectured by Sun. Our proof uses the Cauchy product and hypergeometric transformations. From this result, we derive two additional analogous series for $1/\pi$ involving polynomials of degree $3$. Further identities can be proved using our method; these are presented in a table at the end of the note.
Neural quantum states are powerful variational wavefunctions, but it remains unclear which many-body states can be represented efficiently by modern additive architectures. We introduce Walsh complexity, a basis-dependent measure of how broadly a wavefunction is spread over parity patterns. States with an almost uniform Walsh spectrum require exponentially large Walsh complexity from any good approximant. We show that shallow additive feed-forward networks cannot generate such complexity in the tame regime, e.g. polynomial activations with subexponential parameter scaling. As a concrete example, we construct a simple dimerized state prepared by a single layer of disjoint controlled-$Z$ gates. Although it has only short-range entanglement and a simple tensor-network description, its Walsh complexity is maximal. Full-cube fits across system size and depth are consistent with the complexity bound: for polynomial activations, successful fitting appears only once depth reaches a logarithmic scale in $N$, whereas activation saturation in $\tanh$ produces a sharp threshold-like jump already at depth $3$. Walsh complexity therefore provides an expressibility axis complementary to entanglement and clarifies when depth becomes an essential resource for additive neural quantum states.
Apr 07 2026
cs.CV arXiv:2604.04934v1
We present Vanast, a unified framework that generates garment-transferred human animation videos directly from a single human image, garment images, and a pose guidance video. Conventional two-stage pipelines treat image-based virtual try-on and pose-driven animation as separate processes, which often results in identity drift, garment distortion, and front-back inconsistency. Our model addresses these issues by performing the entire process in a single unified step to achieve coherent synthesis. To enable this setting, we construct large-scale triplet supervision. Our data generation pipeline includes generating identity-preserving human images in alternative outfits that differ from garment catalog images, capturing full upper and lower garment triplets to overcome the single-garment-posed video pair limitation, and assembling diverse in-the-wild triplets without requiring garment catalog images. We further introduce a Dual Module architecture for video diffusion transformers to stabilize training, preserve pretrained generative quality, and improve garment accuracy, pose adherence, and identity preservation while supporting zero-shot garment interpolation. Together, these contributions allow Vanast to produce high-fidelity, identity-consistent animation across a wide range of garment types.
Apr 07 2026
cs.CV arXiv:2604.04933v1
Scene-level point cloud understanding remains challenging due to diverse geometries, imbalanced category distributions, and highly varied spatial layouts. Existing methods improve object-level performance but rely on static network parameters during inference, limiting their adaptability to dynamic scene data. We propose PointTPA, a Test-time Parameter Adaptation framework that generates input-aware network parameters for scene-level point clouds. PointTPA adopts a Serialization-based Neighborhood Grouping (SNG) to form locally coherent patches and a Dynamic Parameter Projector (DPP) to produce patch-wise adaptive weights, enabling the backbone to adjust its behavior according to scene-specific variations while maintaining a low parameter overhead. Integrated into the PTv3 structure, PointTPA demonstrates strong parameter efficiency by introducing two lightweight modules of less than 2% of the backbone's parameters. Despite this minimal parameter overhead, PointTPA achieves 78.4% mIoU on ScanNet validation, surpassing existing parameter-efficient fine-tuning (PEFT) methods across multiple benchmarks, highlighting the efficacy of our test-time dynamic network parameter adaptation mechanism in enhancing 3D scene understanding. The code is available at https://github.com/H-EmbodVis/PointTPA.
Apr 07 2026
cs.CL arXiv:2604.04932v1
The misuse of large language models (LLMs) requires precise detection of synthetic text. Existing works mainly follow binary or ternary classification settings, which can only distinguish pure human/LLM text or collaborative text at best. This remains insufficient for the nuanced regulation, as the LLM-polished human text and humanized LLM text often trigger different policy consequences. In this paper, we explore fine-grained LLM-generated text detection under a rigorous four-class setting. To handle such complexities, we propose RACE (Rhetorical Analysis for Creator-Editor Modeling), a fine-grained detection method that characterizes the distinct signatures of creator and editor. Specifically, RACE utilizes Rhetorical Structure Theory to construct a logic graph for the creator's foundation while extracting Elementary Discourse Unit-level features for the editor's style. Experiments show that RACE outperforms 12 baselines in identifying fine-grained types with low false alarms, offering a policy-aligned solution for LLM regulation.
Apr 07 2026
cs.CV arXiv:2604.04931v1
Local feature matching has long been a fundamental component of 3D vision systems such as Structure-from-Motion (SfM), yet progress has lagged behind the rapid advances of modern data-driven approaches. The newer approaches, such as feed-forward reconstruction models, have benefited extensively from scaling dataset sizes, whereas local feature matching models are still only trained on a few mid-sized datasets. In this paper, we revisit local feature matching from a data-driven perspective. In our approach, which we call LoMa, we combine large and diverse data mixtures, modern training recipes, scaled model capacity, and scaled compute, resulting in remarkable gains in performance. Since current standard benchmarks mainly rely on collecting sparse views from successful 3D reconstructions, the evaluation of progress in feature matching has been limited to relatively easy image pairs. To address the resulting saturation of benchmarks, we collect 1000 highly challenging image pairs from internet data into a new dataset called HardMatch. Ground truth correspondences for HardMatch are obtained via manual annotation by the authors. In our extensive benchmarking suite, we find that LoMa makes outstanding progress across the board, outperforming the state-of-the-art method ALIKED+LightGlue by +18.6 mAA on HardMatch, +29.5 mAA on WxBS, +21.4 (1m, 10$^\circ$) on InLoc, +24.2 AUC on RUBIK, and +12.4 mAA on IMC 2022. We release our code and models publicly at https://github.com/davnords/LoMa.
Large reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the final answer. In this work, we study the confidence of intermediate answers during reasoning and observe two characteristic behaviors: correct reasoning trajectories often reach high-confidence answers early, while incorrect rollouts tend to produce long, unproductive reasoning traces and exhibit less reliable confidence dynamics. Motivated by these observations, we propose CoDE-Stop (Confidence Dynamics Early Stop), an early stopping method that leverages the dynamics of intermediate answer confidence to decide when to terminate reasoning, requiring no additional training and easily integrating into existing models. We evaluate CoDE-Stop on diverse reasoning and science benchmarks across multiple models. Compared to prior early stopping methods, it achieves a more favorable accuracy-compute tradeoff and reduces total token usage by 25-50% compared to standard full-length reasoning. In addition, we provide analyses of confidence dynamics during reasoning, offering insights into how confidence changes in both correct and incorrect trajectories.
Apr 07 2026
cs.CV arXiv:2604.04929v1
Most vision-language models (VLMs) apply a large language model (LLM) as the decoder, where the response tokens are generated sequentially through autoregression. Therefore, the number of output tokens can be the bottleneck of the end-to-end latency. However, different models may require vastly different numbers of output tokens to achieve comparable performance. In this work, we conduct a comprehensive analysis of the latency across different components of VLMs on simulated data. The experiment shows that a large model with fewer output tokens can be more efficient than a small model with a long output sequence. The empirical study on diverse real-world benchmarks confirms the observation that a large model can achieve better or comparable performance as a small model with significantly fewer output tokens. To leverage the efficiency of large models, we propose a multi-agent inference framework that keeps large models with short responses but transfers the key reasoning tokens from the small model when necessary. The comparison on benchmark tasks demonstrates that by reusing the reasoning tokens from small models, it can help approach the performance of a large model with its own reasoning, which confirms the effectiveness of our proposal.
We present a general construction of embedded minimal and constant mean curvature surfaces in $\mathbb{S}^n$ and one-phase free boundaries joined by a smooth interpolation by capillary hypersurfaces. This framework recovers all known families and produces new minimal surfaces in the sphere with rich topological structures as sphere bundles over base spaces which include space-form products, projective planes over division algebras, Stiefel manifolds, complex quadrics, and twisted products and quotients of Lie subgroups of $SO(n)$. We show these bundles are non-trivial and study their homotopy types using topological obstructions, including characteristic classes and tools from $K$-theory and stable homotopy theory. Finally, we prove uniqueness results for the rotationally invariant capillary CMC problem.
In this paper, we construct a novel global bounded cochain extension operator for differential forms on Lipschitz domains. Building upon the classical universal extension of Hiptmair, Li, and Zou, our construction restores global commutativity with the exterior derivative in the natural $H\Lambda^k(\Omega)$ setting. The construction applies to domains and ambient extension sets of arbitrary topology, with strict commutation holding on the orthogonal complement of harmonic forms, as dictated by the underlying topological obstruction. This provides a missing analytical tool for the rigorous foundation of Cut Finite Element Methods (CutFEM). We also obtain continuous uniform Poincaré inequalities and lower bounds for the first Neumann eigenvalue on non-convex domains.
Email remains a central communication medium, yet its long-standing design and interface conventions continue to enable deceptive attacks. This research note presents a structured list of 42 email-based deception techniques, documented with 64 concrete example implementations, organized around the sender, link, and attachment security indicators as well as techniques targeting the email rendering environment. Building on a prior systematic literature review, we consolidate previously reported techniques with newly developed example implementations and introduce novel deception techniques identified through our own examination. Rather than assessing effectiveness or real-world severity, each entry explains the underlying mechanism in isolation, separating the high-level deception goal from its concrete technical implementation. The documented techniques serve as modular building blocks and a structured reference for future work on countermeasures across infrastructure, email client design, and security awareness, supporting researchers as well as developers, operators, and designers working in these areas.
Apr 07 2026
cs.CV arXiv:2604.04925v1
In this paper, we explore the design space of procedural rules for multi-view stereo (MVS). We demonstrate that we can generate effective training data using SimpleProc: a new, fully procedural generator driven by a very small set of rules using Non-Uniform Rational Basis Splines (NURBS), as well as basic displacement and texture patterns. At a modest scale of 8,000 images, our approach achieves superior results compared to manually curated images (at the same scale) sourced from games and real-world objects. When scaled to 352,000 images, our method yields performance comparable to--and in several benchmarks, exceeding--models trained on over 692,000 manually curated images. The source code and the data are available at https://github.com/princeton-vl/SimpleProc.
Pre-trained diffusion models have enabled significant advancements in All-in-One Restoration (AiOR), offering improved perceptual quality and generalization. However, diffusion-based restoration methods primarily rely on fine-tuning or Control-Net style modules to leverage the pre-trained diffusion model's priors for AiOR. In this work, we show that these pre-trained diffusion models inherently possess restoration behavior, which can be unlocked by directly learning prompt embeddings at the output of the text encoder. Interestingly, this behavior is largely inaccessible through text prompts and text-token embedding optimization. Furthermore, we observe that naive prompt learning is unstable because the forward noising process using degraded images is misaligned with the reverse sampling trajectory. To resolve this, we train prompts within a diffusion bridge formulation that aligns training and inference dynamics, enforcing a coherent denoising path from noisy degraded states to clean images. Building on these insights, we introduce our lightweight learned prompts on the pre-trained WAN video model and FLUX image models, converting them into high-performing restoration models. Extensive experiments demonstrate that our approach achieves competitive performance and generalization across diverse degradations, while avoiding fine-tuning and restoration-specific control modules.
In this paper, we develop a stratification-based semantics for Signal Temporal Logic (STL) in which each atomic predicate is interpreted as a membership test in a stratified space. This perspective reveals a novel correspondence principle between stratification theory and STL, showing that most STL formulas can be viewed as inducing a stratification of space-time. The significance of this interpretation is twofold. First, it offers a fresh theoretical framework for analyzing the structure of the embedding space generated by deep reinforcement learning (DRL) and relates it to the geometry of the ambient decision space. Second, it provides a principled framework that both enables the reuse of existing high-dimensional analysis tools and motivates the creation of novel computational techniques. To ground the theory, we (1) illustrate the role of stratification theory in Minigrid games and (2) apply numerical techniques to the latent embeddings of a DRL agent playing such a game where the robustness of STL formulas is used as the reward. In the process, we propose computationally efficient signatures that, based on preliminary evidence, appear promising for uncovering the stratification structure of such embedding spaces.
Elephant random walks were studied recently in \citemukherjee2025elephant on the groups $\mathbb{Z}^{*d_1} * \mathbb{Z}_2^{*d_2}$ whose Cayley graphs are infinite $d$-regular trees with $d = 2d_1 + d_2$. It was found that for $d \ge 3$, the elephant walk is ballistic with the same asymptotic speed $\frac{d - 2}{d}$ as the simple random walk and the memory parameter appears only in the rate of convergence to the limiting speed. In the $d = 2$ case, there are two such groups, both having the bi-infinite path as their Cayley graph. For $(d_1, d_2) = (1, 0)$, the walk is the usual elephant random walk on $\mathbb{Z}$, which exhibits anomalous diffusion. In this article, we study the other case, namely $(d_1, d_2) = (0, 2)$, which corresponds to the infinite dihedral group $D_\infty \cong \mathbb{Z}_2 * \mathbb{Z}_2$. Unlike the classical ERW on $\mathbb{Z}$, which is a time-inhomogeneous Markov chain, the ERW on $D_{\infty}$ is non-Markovian. We show that the first and second order behaviours of the \emphsigned location of the walker agree with those of the simple symmetric random walk on $\mathbb{Z}$, with the memory parameter essentially manifesting itself via a lower order correction term that can be written as an explicit functional of the elephant walk on $\mathbb{Z}$. Our result demonstrates that unlike the simple random walk, the elephant walk is sensitive to local algebraic relations. Indeed, although $D_{\infty}$ is virtually abelian, containing $\mathbb{Z}$ as a finite-index subgroup, the involutive nature of its generators effectively neutralises memory, thereby ruling out any potential superdiffusive behaviour, in contrast to the superdiffusion observed on its abelian cousin $\mathbb{Z}$.
Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few, leading to poor top-key selection and unstable reasoning. To avoid this issue, we turn to the pre-RoPE space, where we observe that Q and K vectors are highly concentrated around fixed non-zero centers and remain stable across positions -- Q/K concentration. We show that this concentration causes queries to preferentially attend to keys at specific distances (e.g., nearest keys), with the centers determining which distances are preferred via a trigonometric series. Based on this, we propose TriAttention to estimate key importance by leveraging these centers. Via the trigonometric series, we use the distance preference characterized by these centers to score keys according to their positions, and also leverage Q/K norms as an additional signal for importance estimation. On AIME25 with 32K-token generation, TriAttention matches Full Attention reasoning accuracy while achieving 2.5x higher throughput or 10.7x KV memory reduction, whereas leading baselines achieve only about half the accuracy at the same efficiency. TriAttention enables OpenClaw deployment on a single consumer GPU, where long context would otherwise cause out-of-memory with Full Attention.
We study physics-informed neural networks (PINNs) as numerical tools for the optimal control of semilinear partial differential equations. We first recall the classical direct and indirect viewpoints for optimal control of PDEs, and then present two PINN formulations: a direct formulation based on minimizing the objective under the state constraint, and an indirect formulation based on the first-order optimality system. For a class of semilinear parabolic equations, we derive the state equation, the adjoint equation, and the stationarity condition in a form consistent with continuous-time Pontryagin-type optimality conditions. We then specialize the framework to an Allen-Cahn control problem and compare three numerical approaches: (i) a discretize-then-optimize adjoint method, (ii) a direct PINN, and (iii) an indirect PINN. Numerical results show that the PINN parameterization has an implicit regularizing effect, in the sense that it tends to produce smoother control profiles. They also indicate that the indirect PINN more faithfully preserves the PDE contraint and optimality structure and yields a more accurate neural approximation than the direct PINN.
We show that the Schur-complement reduction of a chemical reaction network (CRN) from Hirono et al. is the categorical complement of the stoichiometric arrow in the arrow category $[\mathbf{A}_2,\mathbf{Vect}]$. This identifies the ambient category in which topological reduction of chemical reaction networks is functorial and explains the reduced stoichiometric matrix as a universal diagrammatic construction. We further define a reconstruction functor from a restricted subcategory of $[\mathbf{A}_2, \mathbf{Vect}]$ back to CRNs and prove an adjunction with the stoichiometric functor.
Chaoran Chen, Zhiping Zhang, Zeya Chen, Eryue Xu, Yinuo Yang, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li Apr 07 2026
cs.HC arXiv:2604.04918v1
LLM-powered computer-use agents (CUAs) are shifting users from direct manipulation to supervisory coordination. Existing oversight mechanisms, however, have largely been studied as isolated interface features, making broader oversight strategies difficult to compare. We conceptualize CUA oversight as a structural coordination problem defined by delegation structure and engagement level, and use this lens to compare four oversight strategies in a mixed-methods study with 48 participants in a live web environment. Our results show that oversight strategy more reliably shaped users' exposure to problematic actions than their ability to correct them once visible. Plan-based strategies were associated with lower rates of agent problematic-action occurrence, but not equally strong gains in runtime intervention success once such actions became visible. On subjective measures, no single strategy was uniformly best, and the clearest context-sensitive differences appeared in trust. Qualitative findings further suggest that intervention depended not only on what controls users retained, but on whether risky moments became legible as requiring judgment during execution. These findings suggest that effective CUA oversight is not achieved by maximizing human involvement alone. Instead, it depends on how supervision is structured to surface decision-critical moments and support their recognition in time for meaningful intervention.
What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforcement learning (RL) pipelines with non-public data. We introduce Vero, a family of fully open VLMs that matches or exceeds existing open-weight models across diverse visual reasoning tasks. We scale RL data and rewards across six broad task categories, constructing Vero-600K, a 600K-sample dataset from 59 datasets, and designing task-routed rewards that handle heterogeneous answer formats. Vero achieves state-of-the-art performance, improving over four base models by 3.7-5.5 points on average across VeroEval, our suite of 30 challenging benchmarks. Starting from Qwen3-VL-8B-Instruct, Vero outperforms Qwen3-VL-8B-Thinking on 23 of 30 benchmarks without additional proprietary thinking data. When trained from the same base model, Vero-600K exceeds existing RL datasets across task categories. Systematic ablations reveal that different task categories elicit qualitatively distinct reasoning patterns that transfer poorly in isolation, suggesting that broad data coverage is the primary driver of strong RL scaling. All data, code, and models are released.
Apr 07 2026
cs.LG arXiv:2604.04916v1
Extreme weather events, such as severe storms, hurricanes, snowstorms, and ice storms, which are exacerbated by climate change, frequently cause widespread power outages. These outages halt industrial operations, impact communities, damage critical infrastructure, profoundly disrupt economies, and have far-reaching effects across various sectors. To mitigate these effects, the University of Connecticut and Eversource Energy Center have developed an outage prediction modeling (OPM) system to provide pre-emptive forecasts for electric distribution networks before such weather events occur. However, existing predictive models in the system do not incorporate the spatial effect of extreme weather events. To this end, we develop Spatially Aware Hybrid Graph Neural Networks (SA-HGNN) with contrastive learning to enhance the OPM predictions for extreme weather-induced power outages. Specifically, we first encode spatial relationships of both static features (e.g., land cover, infrastructure) and event-specific dynamic features (e.g., wind speed, precipitation) via Spatially Aware Hybrid Graph Neural Networks (SA-HGNN). Next, we leverage contrastive learning to handle the imbalance problem associated with different types of extreme weather events and generate location-specific embeddings by minimizing intra-event distances between similar locations while maximizing inter-event distances across all locations. Thorough empirical studies in four utility service territories, i.e., Connecticut, Western Massachusetts, Eastern Massachusetts, and New Hampshire, demonstrate that SA-HGNN can achieve state-of-the-art performance for power outage prediction.
Apr 07 2026
cs.HC arXiv:2604.04915v1
Wearable devices increasingly support stress detection, while LLMs enable conversational mental health support. However, designing systems that meaningfully connect wearable-triggered stress events with generative dialogue remains underexplored, particularly from a design perspective. We present EmBot, a functional mobile application that combines wearable-triggered stress detection with LLM-based conversational support for daily stress management. We used EmBot as a design probe in semi-structured interviews with 15 mental health experts to examine their perspectives and surface early design tensions and considerations that arise from wearable-triggered conversational support, informing the future design of such systems for daily stress management and mental health support.
Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congestion control. For safe deployment, however, it is critical to reason about how agents behave across the range of system states they encounter in practice. Existing verification-based methods in this domain primarily focus on point properties, defined around fixed input states, which offer limited coverage and require substantial manual effort to identify relevant input-output pairs for analysis. In this paper, we study symbolic properties, that specify expected behavior over ranges of input states, for DRL agents in systems and networking. We present a generic formulation for symbolic properties, with monotonicity and robustness as concrete examples, and show how they can be analyzed using existing DNN verification engines. Our approach encodes symbolic properties as comparisons between related executions of the same policy and decomposes them into practically tractable sub-properties. These techniques serve as practical enablers for applying existing verification tools to symbolic analysis. Using our framework, diffRL, we conduct an extensive empirical study across three DRL-based control systems, adaptive video streaming, wireless resource management, and congestion control. Through these case studies, we analyze symbolic properties over broad input ranges, examine how property satisfaction evolves during training, study the impact of model size on verifiability, and compare multiple verification backends. Our results show that symbolic properties provide substantially broader coverage than point properties and can uncover non-obvious, operationally meaningful counterexamples, while also revealing practical solver trade-offs and limitations.
Apr 07 2026
cs.CV arXiv:2604.04913v1
Anticipating diverse future states is a central challenge in video world modeling. Discriminative world models produce a deterministic prediction that implicitly averages over possible futures, while existing generative world models remain computationally expensive. Recent work demonstrates that predicting the future in the feature space of a vision foundation model (VFM), rather than a latent space optimized for pixel reconstruction, requires significantly fewer world model parameters. However, most such approaches remain discriminative. In this work, we introduce DeltaTok, a tokenizer that encodes the VFM feature difference between consecutive frames into a single continuous "delta" token, and DeltaWorld, a generative world model operating on these tokens to efficiently generate diverse plausible futures. Delta tokens reduce video from a three-dimensional spatio-temporal representation to a one-dimensional temporal sequence, for example yielding a 1,024x token reduction with 512x512 frames. This compact representation enables tractable multi-hypothesis training, where many futures are generated in parallel and only the best is supervised. At inference, this leads to diverse predictions in a single forward pass. Experiments on dense forecasting tasks demonstrate that DeltaWorld forecasts futures that more closely align with real-world outcomes, while having over 35x fewer parameters and using 2,000x fewer FLOPs than existing generative world models. Code and weights: https://deltatok.github.io.