Artificial Intelligence news as of 6am UTC on Friday, April 24, 2026

Every breakthrough. Every lab. Every day.

We track OpenAI, DeepMind, Anthropic, and 17 other labs daily - with AI-powered summaries, trend charts, and a weekly digest.

Choose from 100+ institutions to build your own feed

Safety & AlignmentCapabilitiesInfrastructureApplications

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

AnthropicDeepMindOpenAI

Showing 50 of 100 selected papers · 50 of 452 other papers

This week - Selected (50)The latest this week from Selected Labs (50)

Apr 23, 2026

UW1d ago·also Cornell, NTU Taiwan

Seeing Fast and Slow: Learning the Flow of Time in Videos

Time can be learned and manipulated in videos, leading to groundbreaking advancements in speed-conditioned video generation and temporal super-resolution.

Yen-Siang Wu, Rundong Luo, Jingsen Zhu +6

Computer Vision Multimodal Models

Tsinghua AI1d ago

UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

Bridging the gap between image generation and detection, UniGenDet leverages a unified framework to enhance both fidelity and interpretability in generated images.

Yanran Zhang, Wenzhao Zheng, Yifei Li +5

Architecture Design (Transformers, SSMs, MoE)Computer Vision Data Curation & Synthetic Data

Apr 22, 2026

Stanford HAI2d ago

The Origin of Edge of Stability

The trajectory of gradient descent is not random; it is systematically forced toward the critical threshold of $2/η$, revealing a hidden structure in neural network optimization.

Elon Litman

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

DAMO2d ago

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization

TPGO allows multi-agent systems to learn from their own optimization history, leading to unprecedented self-improvement in performance.

Shan He, Runze Wang, Zhuoyun Du +4

Natural Language Processing Tool Use & Agents Training Efficiency & Optimization

ETH2d ago

Participatory provenance as representational auditing for AI-mediated public consultation

AI-driven summaries of public consultations can systematically exclude dissenting voices, raising concerns about biased policy recommendations even when individual outputs seem reasonable.

Sachit Mahajan

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

MIT CSAIL2d ago·also Bristol, Conservation X Labs, Cornell, CTU Prague +10

Centering Ecological Goals in Automated Identification of Individual Animals

Automated identification of individual animals can only be effective if it aligns with ecological questions and data practices, not just algorithmic accuracy.

Lukas Picek, Timm Haucke, Lukáš Adam +16

Computer Vision Scientific Discovery & Drug Design

Google Research2d ago

RSRCC: A Remote Sensing Regional Change Comprehension Benchmark Constructed via Retrieval-Augmented Best-of-N Ranking

Current remote sensing change captioning datasets miss fine-grained localized semantic reasoning, but RSRCC fills this gap with 126k change-specific questions.

Roie Kazoom, Yotam Gigi, George Leifman +2

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Google Research2d ago·also Max Planck

Semantic Recall for Vector Search

Stop penalizing your ANN search algorithms for failing to retrieve irrelevant neighbors – Semantic Recall offers a more nuanced and effective way to measure retrieval quality.

Leonardo Kuffó, Ioanna Tsakalidou, Roberta De Viti +3

Eval Frameworks & Benchmarks Recommendation & Information Retrieval

2d ago·also Microsoft Research, California State Polytechnic University

Auditing and Controlling AI Agent Actions in Spreadsheets

Users who actively participate in an AI agent's spreadsheet execution not only improve task outcomes, but also gain a deeper understanding and feel more ownership over the results.

Sadra Sabouri, Zeinabsadat Saghi, Run Huang +4

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp Tool Use & Agents

Amazon Science2d ago

Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context

Directly embedding quantile tokens into input sequences leads to sharper and more accurate distribution predictions, outperforming traditional methods by a substantial margin.

Yilun Zhu, Zhuang Yuan, Nikhita Vedula +6

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Tsinghua AI2d ago·also NUS, CAS, Fudan, Place [56] +1

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

Pocket-sized VLA models can now achieve state-of-the-art robot manipulation performance by pre-training on a curated multimodal dataset and injecting manipulation-relevant representations into the action space.

Yupeng Zheng, Xiang Li, Songen Gu +12

Multimodal Models Robotics & Embodied AI

2d ago·also Microsoft Research, Independent

From Hidden Profiles to Governable Personalization: Recommender Systems in the Age of LLM Agents

LLMs are poised to flip the script on personalization, giving users unprecedented control over their data and how it's used across platforms.

Jiahao Liu, Mingzhe Han, Guanming Liu +6

Recommendation & Information Retrieval Tool Use & Agents

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

2d ago·also Microsoft Research, KAUST, Northeastern, University of Missouri

LAFA: A Framework for Reproducible Longitudinal Assessment of Protein Function Annotation Models

Continuous benchmarking of protein function prediction models is now possible, enabling faster iteration and more robust performance tracking as annotations evolve.

An Phan, Yanli Wang, Frimpong Boadu +5

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design

RoboScience2d ago·also NUS, HUST, SCUT

FingerEye: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation

A low-cost, compact sensor provides continuous vision-tactile feedback, enabling robots to "see" and "feel" their way through dexterous manipulation tasks.

Xuanye Wu, Tianyu Qiu

Computer Vision Robotics & Embodied AI

2d ago·also Google Research, VIA Research Center

R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

LVLMs can self-detect and correct object hallucinations by focusing on specific image regions, offering a simple, training-free fix.

Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr +2

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

2d ago·also ETH, AI Center Tübingen, ELLIS, Tübingen

Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

Deterministic decoding can outperform stochastic self-consistency in constrained domains by systematically exploring high-probability reasoning traces, leading to better performance with less computation.

Johannes Zenn, Guinan Su, Mrinmaya Sachan +1

Code Generation & Program Synthesis Inference & Quantization Reasoning & Chain-of-Thought

CMU ML2d ago

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

Continual learning for LLM agents hits a wall: scaling models doesn't reliably improve skill generation, and self-feedback can lead to recursive drift.

Shanshan Zhong, Shan Zhong, Yiming Lu +17

Eval Frameworks & Benchmarks Robotics & Embodied AI Tool Use & Agents

Tsinghua AI2d ago·also Huawei, Shenzhen University

GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning

LLMs can reason more effectively by directly tracking their own belief in the correct answer throughout the reasoning process, enabling more targeted policy updates.

Jingyi Wang, Lei Zhu, Tengjin Weng +8

Reasoning & Chain-of-Thought RLHF & Preference Learning

2d ago·also Tsinghua AI, Fudan, Hamburg, Hubei University of Chinese Medicine

ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement

Achieve superhuman dexterity: ALAS unlocks robust long-horizon task completion by decoupling environment understanding from motor control, enabling generalization across diverse human-scene interaction scenarios.

Yutong Shen, Hangxu Liu, Lei Zhang +4

Robotics & Embodied AI World Models & Planning

School of Computer Science and Software Engineering2d ago·also Tsinghua AI, University of Nottingham, Wenzhou Medical University

X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis

MLLMs still struggle to integrate diverse data for clinical reasoning, as evidenced by their poor performance on a new ophthalmology benchmark spanning image quality assessment to diagnosis.

Gui Wang, Zehao Zhong, YongSong Zhou +6

Eval Frameworks & Benchmarks Multimodal Models Reasoning & Chain-of-Thought

Mila2d ago

Generative Flow Networks for Model Adaptation in Digital Twins of Natural Systems

Sampling plausible configurations of digital twins can reveal multiple valid parameterizations, enhancing model adaptation in complex natural systems.

Pascal Archambault, Houari Sahraoui, Eugene Syriani

Scientific Discovery & Drug Design World Models & Planning

CMU ML2d ago·also HKU, INFIFORCE Intelligent Technology

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

Extracting temporal geometry from generative models can boost reinforcement learning performance by over 2x without changing the optimal policy.

Aravind Venugopal, Jiayu Chen, Xudong Wu +3

World Models & Planning

MIT CSAIL2d ago·also Perseus Labs

pAI/MSc: ML Theory Research with Humans on the Loop

Imagine slashing the human effort needed to go from hypothesis to submission-ready ML theory paper by orders of magnitude.

Mahmoud Abdelmoneum, Pierfrancesco Beneventano, Tomaso Poggio

Open-Source Models & Weights Scientific Discovery & Drug Design Tool Use & Agents

Apr 21, 2026

MIT CSAIL3d ago·also NYU

An Efficient Black-Box Reduction from Online Learning to Multicalibration, and a New Route to $Φ$-Regret Minimization

Forget complex fixed-point machinery: this work offers a dramatically simpler and more efficient route from external regret to $Φ$-regret minimization.

Gabriele Farina, Juan Carlos Perdomo

Natural Language Processing Training Efficiency & Optimization

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

DeepMind3d ago·also INRIA, SequeL team

Planning in entropy-regularized Markov decision processes and games

Entropy regularization makes planning provably easy: SmoothCruiser achieves polynomial sample complexity in MDPs where standard methods fail.

Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard +2

Training Efficiency & Optimization World Models & Planning

ETH3d ago·also Tsinghua AI, NTU, UMich

Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments

TurboQuant's claimed advantages over RaBitQ in quantization don't hold up under rigorous, reproducible comparison, raising questions about its practical utility.

Jianyang Gao, Yutong Gou, Yuexuan Xu +5

Inference & Quantization Open-Source Models & Weights Training Efficiency & Optimization

Google Research3d ago·also Bar-Ilan, Cambridge

Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs

Multilingual LLMs exhibit a surprising "American bias," even when prompted in other languages, and instruction tuning makes it worse.

Guy Mor-Lan, Omer Goldman, Matan Eyal +6

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

NVIDIA3d ago

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

Bridging the offline-streaming gap in ASR is now more achievable: a single RNN-Transducer model can deliver high accuracy in both settings, thanks to a novel consistency regularization technique.

A.S. Andrusenko, Vladimir Bataev, Lilit Grigoryan +3

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Stanford HAI3d ago·also Macquarie

Are Large Language Models Economically Viable for Industry Deployment?

Forget chasing the biggest LLM – this benchmark reveals that smaller models (<2B params) can deliver 3x better energy efficiency and faster ROI in real-world industry deployments.

Abdullah Mohammad, Sushant Kumar Ray, Pushkar Arora +4

Distributed Systems & Hardware Eval Frameworks & Benchmarks Inference & Quantization

MIT CSAIL3d ago·also Chulalongkorn University

Cyclic Equalizability Characterized by Parikh Vectors

Cyclic equalizability, a concept relevant to card-based cryptography, boils down to having identical Parikh vectors.

Sarunyu Thongjarast, Sarit Pasiphol, Suthee Ruangwises

Natural Language Processing

Stanford HAI3d ago

FASTER: Value-Guided Sampling for Fast RL

Get the performance boost of expensive sampling-based RL policies for a fraction of the compute by learning to prune action candidates early in the diffusion denoising process.

Perry Dong, Alexander Swerdlow, Dorsa Sadigh +1

Robotics & Embodied AI Training Efficiency & Optimization

3d ago·also Tsinghua AI, Fudan, PKU, PolyU

DebugRepair: Enhancing LLM-Based Automated Program Repair via Self-Directed Debugging

LLMs can fix 26% more bugs when given access to intermediate runtime states during program repair, proving that even the best models struggle to infer root causes from just failure symptoms.

Linhao Wu, Yifei Pei, Zhen Yang +11

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

NUS3d ago·also University of Nottingham

UniCon3R: Contact-aware 3D Human-Scene Reconstruction from Monocular Video

Contact-aware reconstruction transforms how we achieve realistic human-scene interactions in 3D environments, correcting artifacts that have plagued previous methods.

Shashank Tripathi, Nikos Athanasiou, Kai Xu +1

Computer Vision Robotics & Embodied AI World Models & Planning

Tsinghua AI3d ago·also Sheffield

HarmoniDiff-RS: Training-Free Diffusion Harmonization for Satellite Image Composition

Training-free diffusion models can now harmonize satellite imagery across diverse domains, enabling scalable remote-sensing synthesis without retraining.

Xiaoqi Zhuang, Jefersson A. Dos Santos, Jungong Han

Computer Vision Data Curation & Synthetic Data

3d ago·also NUS

Debating the Unspoken: Role-Anchored Multi-Agent Reasoning for Half-Truth Detection

Uncover misleading half-truths by pitting a Politician agent against a Scientist agent in a debate moderated by a Judge, revealing what's left unsaid.

Yixuan Tang, Hang Feng, Anthony K. H. Tung

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Beijing Language and Culture University3d ago·also DAMO, ELLIS, HIT, IBM Research +4

CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks

LLMs still struggle to reason in context when cultural and linguistic nuances are involved, achieving only 44% accuracy on a new grounded benchmark spanning 14 languages.

Wenjiang Luo, Haotian Ye, Md Mehrab Hossain +16

Eval Frameworks & Benchmarks Natural Language Processing

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

3d ago·also DAMO

Thinking Before Matching: A Reinforcement Reasoning Paradigm Towards General Person Re-Identification

Achieve state-of-the-art person re-identification with only 20% of the data by explicitly teaching the model to "think" before matching identities.

Quan Zhang, Jingze Wu, Xiaohua Xie +2

Computer Vision Reasoning & Chain-of-Thought

3d ago·also Tsinghua AI, Sen University

Structure-Semantic Decoupled Modulation of Global Geospatial Embeddings for High-Resolution Remote Sensing Mapping

Stop fragmented land cover predictions: SSDM leverages global geospatial embeddings to guide local feature extraction, achieving state-of-the-art performance in high-resolution remote sensing mapping.

Jienan Lyu, Miao Yang, Jinchen Cai +4

Computer Vision Multimodal Models

3d ago·also MIT CSAIL, Nankai University, NJU, University of California +1

TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation

Multi-event video generation gets a 33% quality boost with TS-Attn, a training-free attention mechanism that dynamically aligns video content with complex temporal prompts.

Hongyu Zhang, Yufan Deng, Zilin Pan +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Amazon Science3d ago·also ASU

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

Stop training MoEs from scratch: "expert upcycling" lets you expand existing models with duplicated experts and targeted fine-tuning, slashing training costs by 32% without sacrificing performance.

Chaitanya Dwivedi, Binxuan Huang, Himanshu Gupta +3

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Tsinghua AI3d ago·also UCL, UT Austin

Large language models perceive cities through a culturally uneven baseline

LLMs don't see cities neutrally; their perception is skewed towards a culturally uneven baseline, favoring Western perspectives.

Rong Zhao, Wanqi Liu, Zhizhou Sha +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

NUS3d ago·also HIT, SCU, UMN

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

LLM agents suffer from a human-like cognitive bias, Actor-Observer Asymmetry, leading them to make inconsistent judgments about their own and others' failures.

Rui Wu, Mong-Li Lee

Constitutional AI & AI Ethics Reasoning & Chain-of-Thought Tool Use & Agents

3d ago·also Google Research

An AI Agent Execution Environment to Safeguard User Data

GAAP offers a deterministic, trust-minimized approach to AI agent security, safeguarding user data even when models are compromised or prompts are injected.

Robert Stanley, Avirishu Verma, Avi Verma +4

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

3d ago·also CMU ML

VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

End-to-end training of Vision-Language-Action models just got a whole lot easier: VLA Foundry unifies LLM, VLM, and VLA training in a single open-source framework.

Jean Mercat, Jean-Pierre Mercat, Sedrick Scott Keh +8

Multimodal Models Open-Source Models & Weights Robotics & Embodied AI

Tsinghua AI3d ago

Diff-SBSR: Learning Multimodal Feature-Enhanced Diffusion Models for Zero-Shot Sketch-Based 3D Shape Retrieval

Freezing a Stable Diffusion backbone and injecting CLIP and BLIP features lets you beat the state-of-the-art in zero-shot sketch-based 3D shape retrieval, without any costly retraining.

Hang Cheng, Fanhe Dong, Fanhe Dong +1

Computer Vision Multimodal Models Recommendation & Information Retrieval

CMU ML3d ago

EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training

VLMs can be significantly boosted on embodied tasks by mid-training on a carefully curated subset of VLM data that is highly aligned with the VLA domain, rivaling the performance of much larger models.

Yiyang Du, Zhanqiu Guo, Xin Ye +2

Multimodal Models Robotics & Embodied AI Training Efficiency & Optimization

Univ. Lille3d ago·also DeepMind, Centrale Lille, INRIA

On two ways to use determinantal point processes for Monte Carlo integration

DPP-based Monte Carlo integration can offer variance reduction, but choosing the right DPP—fixed vs. tailored to the integrand—determines whether you get a biased but faster converging estimator or an unbiased but standard-rate estimator.

Guillaume Gautier, Rémi Bardenet, Michal Valko

Scientific Discovery & Drug Design Training Efficiency & Optimization

Apr 20, 2026

Helmut Harbrecht4d ago·also ETH

Neural Shape Operator Surrogates -- Expression Rate Bounds

Neural operators can achieve uniform convergence rates for approximating solution maps across diverse geometric domains, challenging traditional assumptions about shape-dependent PDE solutions.

Helmut Harbrecht

Scientific Discovery & Drug Design

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

Stanford HAI4d ago·also Google Research

FUSE: Ensembling Verifiers with Zero Labeled Data

FUSE achieves verification quality on par with semi-supervised methods, all without needing any labeled data.

Joonhyuk Lee, Virginia Ma, Sarah Zhao +4

Eval Frameworks & Benchmarks RLHF & Preference Learning

4d ago·also CMU ML, University of California

Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling

LLMs waste compute on tokens that have already "figured it out" – DASH selectively skips these tokens during prefill, speeding things up without retraining or sacrificing accuracy.

Yujie Chen, Tailai Chen, Yifeng Gao +4

Inference & Quantization Scaling Laws & Emergent Abilities

This week - Other Labs (50)The latest this week from everyone else (50)

Apr 23, 2026

Shanda AI Research Tokyo1d ago·also Shanghai Innovation, UTokyo

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

WorldMark enables fair, apples-to-apples comparisons of interactive video models, leveling the playing field for researchers and practitioners alike.

Xiaojie Xu, Zhengyuan Lin, Kang He +5

Eval Frameworks & Benchmarks Multimodal Models World Models & Planning

1d ago·also SJTU

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

Extracting actionable insights from noisy customer incidents at scale is now possible: TingIS achieves a 95% discovery rate for high-priority incidents with just minutes of latency.

Jun Wang, Ziyin Zhang, Rui Wang +2

Distributed Systems & Hardware Natural Language Processing Recommendation & Information Retrieval

1d ago

StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

Identity encoders can now maintain consistency across diverse artistic styles, achieving human-level performance in recognizing faces even in heavily stylized formats.

Kwan Yun, Changmin Lee, Ayeong Jeong +3

Computer Vision Data Curation & Synthetic Data Eval Frameworks & Benchmarks

1d ago

Dissecting clinical reasoning failures in frontier artificial intelligence using 10,000 synthetic cases

Automated expert-level evaluation across 10,000 cases characterised artificial intelligence clinical blind spots hitherto invisible to small-scale testing and should become standard for uncovering serious failures and implementing safety guardrails before clinical deployment exposes patients to risk.

S. D. Auger, J. Varley, M. Hargovan +1

Apr 22, 2026

2d ago

Decoding Text Spans for Efficient and Accurate Named-Entity Recognition

SpanDec achieves state-of-the-art NER accuracy with significantly improved throughput, proving that you don't need to exhaustively process every possible span to achieve top performance.

Andrea Maracani, Savas Ozkan, Junyi Zhu +2

Inference & Quantization Natural Language Processing

2d ago·also Adobe Research

ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control

Ditch the fixed trade-offs: ParetoSlider lets you smoothly navigate competing generative goals in diffusion models at inference time, without retraining.

Shelly Golan, Michael Finkelson, Ariel Bereslavsky +2

Computer Vision Multimodal Models RLHF & Preference Learning

Tsinghua AI2d ago·also Imperial, of CAD & CG, State Key Laboratory, ZJU

Exploring Spatial Intelligence from a Generative Perspective

Generative training doesn't just make images prettier; it can actually boost a model's spatial reasoning skills.

Muzhi Zhu, Shunyao Jiang, Huan Zheng +10

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

2d ago·also Institute for Geography and Geology

Global Offshore Wind Infrastructure: Deployment and Operational Dynamics from Dense Sentinel-1 Time Series

A new global dataset reveals intricate deployment patterns and operational dynamics of offshore wind infrastructure, enabling unprecedented temporal analysis.

Thorsten Hoeser, Felix Bachofer, Claudia Kuenzer

Computer Vision Scientific Discovery & Drug Design

Humanoid Robot (Shanghai) Co.2d ago·also HIT, Tongji

VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation

Vision-based tactile signals in the VTOUCH dataset significantly enhance bimanual manipulation capabilities, paving the way for more effective robotic interactions.

Qianxi Hua, Xinyue Li, Zheng Yan +4

Computer Vision Multimodal Models Robotics & Embodied AI

2d ago

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

Exact attention over billion-token sequences is now possible on a single GPU, thanks to a novel streaming approach that avoids out-of-memory errors without approximation.

Yiming Bian, Joshua M. Akey

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

2d ago·also Aristotle University of Thessaloniki, Max Planck

LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

Ditch sparse contact cues: LEXIS-Flow uses a learned manifold of interaction signatures to capture dense, continuous proximity between humans and objects, leading to more realistic 3D HOI reconstructions.

Dimitrije Antić, Alvaro Budria, George Paschalidis +2

Computer Vision Multimodal Models Robotics & Embodied AI

2d ago·also Old Dominion

Gauge-Equivariant Graph Neural Networks for Lattice Gauge Theories

Gauge-equivariant GNNs unlock the ability to learn intrinsically nonlocal observables in lattice gauge theories by directly embedding non-Abelian symmetries into message passing.

Ali Rayat, Yaohang Li, Gia-Wei Chern

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

University of London2d ago·also Turing Institute

Working Memory Constraints Scaffold Learning in Transformers under Data Scarcity

Fixed-width attention spans can give you better grammar and human-like reading patterns, especially when you're short on training data.

Pranava Madhyastha, Dagmar Adamcova

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

2d ago

F\textsuperscript{2}LP-AP: Fast \& Flexible Label Propagation with Adaptive Propagation Kernel

Ditch the GNN training: this label propagation method matches or beats GNN accuracy while being far more computationally efficient, even on tricky heterophilous graphs.

Yutong Shen, Ruizhe Xia, Jingyi Liu +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

2d ago

Efficient Multi-Cohort Inference for Long-Term Effects and Lifetime Value in A/B Testing with User Learning

Short-term A/B test metrics can be misleading: this paper shows how to accurately estimate long-term value changes by modeling treatment effects as a decaying function learned from multiple cohorts.

Dario Simionato, Andrea Tonon, Mingxue Wang +3

Recommendation & Information Retrieval

2d ago·also Longcat Interaction Team, Meituan

V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization

Open-source MLLMs can now achieve state-of-the-art accuracy on complex tabular reasoning tasks, even outperforming models 18x their size, by explicitly penalizing visual hallucinations and shortcut guessing through process-supervised RL.

Yubo Jiang, Yitong An, Xin Yang +7

Multimodal Models Reasoning & Chain-of-Thought Tool Use & Agents

Cognite AS2d ago·also SINTEF

Fast Bayesian equipment condition monitoring via simulation based inference: applications to heat exchanger health

Get 82x faster Bayesian inference for equipment monitoring by replacing MCMC with neural nets trained on simulated data.

Peter Collett, Alexander Johannes Stasik, Simone Casolo +1

Scientific Discovery & Drug Design Training Efficiency & Optimization World Models & Planning

2d ago·also AWS Agentic AI Labs

Supplement Generation Training for Enhancing Agentic Task Performance

Forget fine-tuning behemoth LLMs for every new task – this paper shows how a tiny, nimble model generating smart supplements can unlock surprisingly strong agentic performance from frozen giants.

Young Min Cho, Daniele Bonadiman, Divya Bhargavi +8

Tool Use & Agents Training Efficiency & Optimization

2d ago·also Oxford

Tokenised Flow Matching for Hierarchical Simulation Based Inference

Unlock 10x faster simulation-based inference in hierarchical models by training on single-site simulations and assembling synthetic multi-site data.

Giovanni Charles, Cosmo Santoni, Seth Flaxman +1

Scientific Discovery & Drug Design Training Efficiency & Optimization

2d ago·also NeuroSpin

Improving clinical interpretability of linear neuroimaging models through feature whitening

Whitening neuroimaging features can transform linear models from black boxes into interpretable tools for understanding brain pathology.

Sara Petiton, Antoine Grigis, Raphaël Vock +1

Interpretability & Mechanistic Interp Scientific Discovery & Drug Design

Samsung R&D Institute UK (SRUK)2d ago·also Samsung

Differentially Private Clustered Federated Learning with Privacy-Preserving Initialization and Normality-Driven Aggregation

Differentially private federated learning gets a boost: PINA achieves 2.9% higher accuracy than state-of-the-art methods by using a novel two-stage approach with privacy-preserving initialization and normality-driven aggregation.

Jie Xu, Haaris Mehmood, Rogier Van Dalen +2

Data Curation & Synthetic Data Distributed Systems & Hardware Training Efficiency & Optimization

Louisiana State University2d ago·also Aristotle University of Thessaloniki, UT Austin

Storm Surge Modeling, Bias Correction, Graph Neural Networks, Graph Convolution Networks

GNNs can slash storm surge forecast errors by over 70%, offering a faster and more accurate alternative to traditional numerical models for coastal disaster prediction.

Noujoud Nader, Stefanos Giaremis, Clint Dawson +3

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

2d ago

MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment

Geometry-aware optimization can dramatically improve LLM alignment by ensuring fairer trade-offs among conflicting human values.

Andor Vári-Kakas, Ji Won Park, Natasa Tagasovska

Constitutional AI & AI Ethics RLHF & Preference Learning Training Efficiency & Optimization

2d ago

On Bayesian Softmax-Gated Mixture-of-Experts Models

Bayesian mixture-of-experts models can achieve robust density and parameter estimation with adaptive expert selection, fundamentally reshaping our approach to complex probabilistic modeling.

Nicola Bariletto, Huy Nguyen, Nhat Ho +1

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

2d ago·also Aarhus Univeristy

CHASM: Unveiling Covert Advertisements on Chinese Social Media

Current MLLMs fail to detect covert advertisements, revealing a critical gap in social media moderation that could mislead consumers and pose ethical risks.

Jingyi Zheng, Tianyi Hu, Yule Liu +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Multimodal Models

Rowan University2d ago·also CHD, SCU

A Hierarchical MARL-Based Approach for Coordinated Retail P2P Trading and Wholesale Market Participation of DERs

Individual prosumers can now effectively coordinate in electricity markets, boosting overall market performance through a novel hierarchical MARL framework.

Patrick Wilk, Ethan Cantor, Yikui Liu +1

Robotics & Embodied AI Tool Use & Agents

https2d ago·also Technion

Temporal Difference Calibration in Sequential Tasks: Application to Vision-Language-Action Models

Reinforcement learning's Temporal Difference value estimation offers a surprisingly effective and theoretically grounded approach to calibrating uncertainty in vision-language-action models for robotics.

Shelly Francis-Meretzki, Mirco Mutti, Yaniv Romano +1

Computer Vision Multimodal Models Robotics & Embodied AI

2d ago

Amortized Vine Copulas for High-Dimensional Density and Information Estimation

VDC achieves high-dimensional density estimation with remarkable speed and accuracy, transforming the landscape of copula modeling.

Houman Safaai

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

2d ago·also Aarhus University

Explicit Dropout: Deterministic Regularization for Transformer Architectures

Explicit dropout achieves superior performance without the randomness of traditional methods, offering a clearer path to regularization control in Transformer models.

Vidhi Agrawal, Illia Oleksiienko, Alexandros Iosifidis

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

2d ago·also Côte d'Azur, Princeton, Sheffield, Université de la Polynésie franc ¸aise

Decentralized Machine Learning with Centralized Performance Guarantees via Gibbs Algorithms

Decentralized learning can match centralized performance by sharing only Gibbs measures, not datasets, opening new avenues for privacy-preserving collaboration.

Yaiza Bermudez, Samir Perlaza, Iñaki Esnaola

Distributed Systems & Hardware Training Efficiency & Optimization

2d ago

Forecasting Individual NetFlows using a Predictive Masked Graph Autoencoder

GNNs can predict network traffic flow with surprising accuracy, particularly in pinpointing connection endpoints.

Georgios Anyfantis, Pere Barlet-Ros

Architecture Design (Transformers, SSMs, MoE)World Models & Planning

2d ago·also ANU

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

LLMs can pinpoint mental states but falter at predicting dialogue trajectories, revealing a critical gap in their reasoning capabilities.

Neemesh Yadav, Palakorn Achananuparp, Jing Jiang +1

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

2d ago

Surrogate Functionals for Machine-Learned Orbital-Free Density Functional Theory

Ditch the expensive energy calculations: this new ML-DFT approach learns directly from ground-state densities, achieving state-of-the-art accuracy with improved runtime scaling.

Roman Remme, Fred A. Hamprecht

Scientific Discovery & Drug Design Training Efficiency & Optimization

2d ago·also NTU, NYU

Unlocking the Forecasting Economy: A Suite of Datasets for the Full Lifecycle of Prediction Market: [Experiments \&Analysis]

A groundbreaking dataset suite reveals the intricate dynamics of decentralized prediction markets, offering unparalleled insights into collective forecasting behavior.

Huaiyu Jia, Luofeng Zhou, Wentao Zhang +3

Data Curation & Synthetic Data

2d ago·also UNC

Calibrating conditional risk

Conditional risk calibration reveals a unique perspective on uncertainty quantification that could transform how we approach decision-making in machine learning.

A. Vasilyev, Yikai Wang, Xiaocheng Li +1

Natural Language Processing Scalable Oversight & Alignment Theory

2d ago

Robustness of Spatio-temporal Graph Neural Networks for Fault Location in Partially Observable Distribution Grids

Counterintuitively, using only measured nodes to define the GNN topology slashes training time by 6x and boosts fault localization accuracy by 11% in power distribution grids.

Burak Karabulut, Carlo Manna, Chris Develder

Architecture Design (Transformers, SSMs, MoE)Red-Teaming & Adversarial Robustness

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

2d ago·also PolyU

Towards Event-Aware Forecasting in DeFi: Insights from On-chain Automated Market Maker Protocols

AMM price prediction accuracy jumps 56% by explicitly modeling the uncertainty in block intervals, revealing the critical role of on-chain event timing.

Huaiyu Jia, Jiehshun You, Yizhi Luo +2

Distributed Systems & Hardware

2d ago·also B), HUST

R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling

R2IF achieves up to 34.62% better performance in function calling accuracy, bridging the gap between reasoning and decision-making in LLMs.

A. Cheng, Kailong Wang, Ling Shi +1

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

2d ago·also Adelaide University

uLEAD-TabPFN: Uncertainty-aware Dependency-based Anomaly Detection with TabPFN

Tabular anomaly detection gets a serious upgrade: uLEAD-TabPFN leverages frozen PFNs to model complex feature dependencies, outperforming existing methods by a significant margin, especially in high-dimensional spaces.

Sha Lu, Jixue Liu, Stefan Peters +4

Data Curation & Synthetic Data

2d ago·also Tencent AI

AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling

Achieve more reliable and interpretable virtual cell perturbation predictions by combining knowledge-driven multimodal modeling with evidence retrieval.

Zhenyu Wang, Geyan Ye, Wei Liu +1

Multimodal Models Reasoning & Chain-of-Thought Scientific Discovery & Drug Design

2d ago·also Department of Mathematics University of California Riverside, Georgia Tech, Illinois Institute of Technology

Structure-Aware Variational Learning of a Class of Generalized Diffusions

Energy-dissipation principles can revolutionize how we infer potential functions in noisy, incomplete data environments, achieving remarkable robustness in generalized diffusion processes.

Yubin Lu, Xiaofan Li, Chun Liu +2

Scientific Discovery & Drug Design Training Efficiency & Optimization

2d ago

Physics-Enhanced Deep Learning for Proactive Thermal Runaway Forecasting in Li-Ion Batteries

Force-feeding physics to LSTMs slashes battery thermal runaway prediction errors by over 80%, making your next e-bike less likely to explode.

Salman Khan, Muhammad Zunair Zamir, Syed Sajid Ullah +1

Architecture Design (Transformers, SSMs, MoE)Scientific Discovery & Drug Design

2d ago·also USC

SMART: A Spectral Transfer Approach to Multi-Task Learning

Transfer learning gets a boost: SMART sidesteps restrictive assumptions and data sharing limitations by transferring spectral information between tasks, leading to improved accuracy and robustness.

Boxin Zhao, Mladen Kolar, Jinchi Lv

Training Efficiency & Optimization

Anhalt University of Applied Sciences2d ago·also CNRS, Université Grenoble Alpes

Lever: Inference-Time Policy Reuse under Support Constraints

Forget retraining: LEVER lets you snap together pre-trained RL policies at inference time, matching or beating from-scratch performance in some cases.

Ihor Vitenki, Noha Ibrahim, S. Amer-Yahia

Recommendation & Information Retrieval Robotics & Embodied AI

2d ago

Toward Safe Autonomous Robotic Endovascular Interventions using World Models

World models can navigate blood vessels autonomously with higher success rates than standard RL, paving the way for safer robotic stroke treatments.

Harry Robertshaw, N. Fischer, Han-Ru Wu +6

Robotics & Embodied AI World Models & Planning

2d ago·also Tata Institute of Fundamental Research

Cover meets Robbins while Betting on Bounded Data: $\ln n$ Regret and Almost Sure $\ln\ln n$ Regret

By cleverly hedging between Cover's and Robbins' betting strategies, you can achieve almost-sure $O(\ln \ln n)$ regret without sacrificing the $O(\ln n)$ worst-case guarantee.

Shubhada Agrawal, Aaditya Ramdas

Red-Teaming & Adversarial Robustness

2d ago·also Central South University, Fudan, HKU

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model

Even the best large vision-language models struggle with multi-image reasoning, scoring only 50% on a new benchmark designed to challenge their capabilities.

Qiguang Chen, Chengyu Luan, Jiajun Wu +7

Eval Frameworks & Benchmarks Multimodal Models Reasoning & Chain-of-Thought

2d ago

AVISE: Framework for Evaluating the Security of AI Systems

All evaluated language models exhibit vulnerabilities to a novel adversarial attack, underscoring the urgent need for improved security measures in AI systems.

Mikko Lempinen, Joni Kemppainen, Niklas Raesalmi

Eval Frameworks & Benchmarks Open-Source Models & Weights Red-Teaming & Adversarial Robustness

The papers worth reading, picked for you

We read everything so you don't have to. One email, zero noise.

2d ago·also SCU

Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation

SiPeR reveals how integrating scene dynamics with Bayesian inference can dramatically enhance the relevance of conversational recommendations in real-world contexts.

Dongding Lin, Jian Wang, Yongqi Li +1

Multimodal Models Natural Language Processing Recommendation & Information Retrieval

2d ago

SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation

Current audio-language models are surprisingly bad at controlling and interpreting subtle vocal cues, failing in nearly half of situational dialogue scenarios.

Ruohan Liu, Shukang Yin, Tao Wang +6

Eval Frameworks & Benchmarks Speech & Audio