ASPLOS 2026 Program Overview
Pittsburgh, PA — March 24-26, 2026
Click here for the Detailed Program
Start: 8:30 AM | Lunch: 12:00 PM | End: 6:00 PM | Presentations: 25 min | 167 unique papers
Day 1: Tuesday, March 24
| Time | Track A | Track B | Track C | Track D |
|---|---|---|---|---|
| 8:00 – 8:20 AM | Opening Remarks | |||
| 8:30 – 10:35 AM | 1A: LLM Serving: Throughput Optimization | 1B: LLM Serving: Latency & Scheduling | 1C: Quantum Computing: Compilation | 1D: CXL & Memory Fabric |
| 10:35 – 10:55 AM | Coffee Break | |||
| 10:55 – 11:55 AM | Keynote 1 — Partha Ranganthan (Google) | |||
| 12:00 – 1:30 PM | Lunch | |||
| 12:00 – 6:00 PM | Poster Session | |||
| 1:30 – 3:35 PM | 2A: LLM Training Systems | 2B: Speculative Decoding | 2C: GPU Systems & Scheduling | 2D: DRAM Reliability & Security |
| 3:35 – 3:55 PM | Coffee Break | |||
| 3:55 – 6:00 PM | 3A: LLM Attention & KV Cache | 3B: Mixture-of-Experts & Efficient Inference | 3C: 3D Gaussian Splatting & Rendering | 3D: Trusted Execution Environments |
| 6:30 – 7:30 PM | WACI Session | |||
| 6:30 – 8:30 PM | Business Meeting | |||
Day 2: Wednesday, March 25
| Time | Track A | Track B | Track C | Track D |
|---|---|---|---|---|
| 8:30 – 10:35 AM | 4A: ML Training & Monitoring | 4B: ML Compilers & Tensor Programs | 4C: Quantum Error Correction | 4D: Processing-in-Memory |
| 10:35 – 10:55 AM | Coffee Break | |||
| 10:55 – 11:55 AM | Keynote 2 — Ion Stoica (UC Berkeley) | |||
| 12:00 – 1:30 PM | Lunch | |||
| 12:00 – 9:00 PM | Poster Session | |||
| 1:30 – 3:35 PM | 5A: Generative Model Serving | 5B: On-Device & Edge AI | 5C: Formal Verification | 5D: Storage & Caching |
| 3:35 – 3:55 PM | Coffee Break | |||
| 3:55 – 6:00 PM | 6A: Neural Network Acceleration | 6B: Hardware Design Languages | 6C: Graph & Sparse Computing | 6D: Disaggregated Memory Systems |
| 6:00 – 9:00 PM | Award Ceremony/Banquet | |||
Day 3: Thursday, March 26
| Time | Track A | Track B | Track C | Track D |
|---|---|---|---|---|
| 8:30 – 10:10 AM | 7A: Fully Homomorphic Encryption | 7B: Compilers & Code Generation | 7C: Testing & Fuzzing | 7D: Processor Microarchitecture |
| 10:10 – 10:30 AM | Coffee Break | |||
| 10:30 – 11:30 AM | Keynote 3 — Hillery Hunter (IBM) | |||
| 12:00 – 1:30 PM | Lunch | |||
| 1:30 – 3:10 PM | 8A: GPU Programming | 8B: Serverless & Cloud Networking | 8C: Memory Hierarchy & Performance | 8D: Reconfigurable Architectures |
| 3:10 – 3:30 PM | Coffee Break | |||
| 3:30 – 5:10 PM | 9A: Systems Profiling & Optimization | 9B: Quantum & Emerging Computing | 9C: Reliability & Fault Tolerance | 9D: Network & Cloud Infrastructure |
| 5:15 – 5:30 PM | Closing Remarks | |||
ASPLOS 2026 Detailed Program
Day 1: Tuesday, March 24
8:00 – 8:20 AM EDT: Opening Remarks
8:30 – 10:35 AM EDT
Towards High-Goodput LLM Serving with Prefill-decode Multiplexing
Bullet: Boosting GPU Utilization for LLM Serving via Dynamic Spatial-Temporal Orchestration
QoServe: Breaking the Silos of LLM Inference Serving
Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
XY-Serve: End-to-End Versatile Production Serving for Dynamic LLM Workloads
PAT: Accelerating LLM Decoding via Prefix-Aware Attention with Resource Efficient Multi-Tile Kernel
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression
BlendServe: Optimizing Offline Inference with Resource-Aware Batching
BAT: Efficient Generative Recommender Serving with Bipartite Attention
MoE-APEX: An Efficient MoE Inference System with Adaptive Precision Expert Offloading
PowerMove: Optimizing Compilation for Neutral Atom Quantum Computers with Zoned Architecture
Reconfigurable Quantum Instruction Set Computers for High Performance Attainable on Hardware
QTurbo: A Robust and Efficient Compiler for Analog Quantum Simulation
Reducing T Gates with Unitary Synthesis
Borrowing Dirty Qubits in Quantum Programs
HybridTier: An Adaptive and Lightweight CXL-Memory Tiering System
vCXLGen: Automated Synthesis and Verification of CXL Bridges for Heterogeneous Architectures
CXLMC: Model Checking CXL Shared Memory Programs
A Programming Model for Disaggregated Memory over CXL
Cxlalloc: Safe and Efficient Memory Allocation for a CXL Pod
10:35 – 10:55 AM EDT: Coffee Break
10:55 – 11:55 AM EDT: Keynote 1 by Partha Ranganthan (Google)
Abstract TBA
12:00 – 1:30 PM EDT: Lunch
12:00 – 6:00 PM EDT: Poster Session
1:30 – 3:35 PM EDT
SNIP: An Adaptive Mixed Precision Framework for Subbyte Large Language Model Training
Fine-grained and Non-intrusive LLM Training Monitoring via Microsecond-level Traffic Measurement
SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
DIP: Efficient Large Multimodal Model Training with Dynamic Interleaved Pipeline
Dynamic Sparsity in Large-Scale Video DiT Training
DFVG: A Heterogeneous Architecture for Speculative Decoding with Draft-on-FPGA and Verify-on-GPU
SwiftSpec: Disaggregated Speculative Decoding and Fused Kernels for Low-Latency LLM Inference
SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs
SpecProto: A Parallelizing Compiler for Speculative Decoding of Large Protocol Buffers Data
EARTH: An Efficient MoE Accelerator with Entropy-Aware Speculative Prefetch and Result Reuse
gShare: Efficient GPU Sharing with Aggressive Scheduling in Multi-tenant FaaS platform
GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management
Asynchrony and GPUs: Bridging this Dichotomy for I/O with AGIO
MSCCL++: Rethinking GPU Communication Abstractions for AI Inference
Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums
RowArmor: Efficient and Comprehensive Protection Against DRAM Disturbance Errors
APT: Securing Against DRAM Read Disturbance via Adaptive Probabilistic In-DRAM Trackers
STRAW: Stress-Aware WL-Based Read Disturbance Management for High-Density NAND Flash Memory
Trust-V: Toward Secure and Reliable Storage for Trusted Execution Environments
Optimizer-Friendly Instrumentation for Event Quantification with PRUE Algorithm
3:35 – 3:55 PM EDT: Coffee Break
3:55 – 6:00 PM EDT
I/O Analysis is All You Need: An I/O Analysis for Long-Sequence Attention
REPA: Reconfigurable PIM for the Joint Acceleration of KV Cache Offloading and Processing
STARC: Selective Token Access with Remapping and Clustering for Efficient LLM Decoding on PIM Systems
Mugi: Value Level Parallelism For Efficient LLMs
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill & Decode Inference
LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training
oFFN: Outlier and Neuron-aware Structured FFN for Fast yet Accurate LLM Inference
MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
FastTTS: Accelerating Test-Time Scaling for Edge LLM Reasoning
GS-Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading
Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration
CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting
Nebula: Infinite-Scale 3D Gaussian Splatting in VR via Collaborative Rendering and Accelerated Stereo Rasterization
AGS: Accelerating 3D Gaussian Splatting SLAM via CODEC-Assisted Frame Covisibility Detection
Detecting Inconsistencies in ARM CCA’s Formally Verified Specification
WorksetEnclave: Towards Optimizing Cold Starts in Confidential Serverless with Workset-Based Enclave Restore
TEEM³: Core-Independent and Cooperating Trusted Execution Environments
WAVE: Leveraging Architecture Observation for Privacy-Preserving Model Oversight
Compass: Navigating the Design Space of Taint Schemes for RTL Security Verification
6:30 – 7:30 PM EDT: WACI Session
6:30 – 8:30 PM EDT: Business Meeting
Day 2: Wednesday, March 25
8:30 – 10:35 AM EDT
T-Control: An Efficient Dynamic Tensor Rematerialization System for DNN Training
NotebookOS: A Replicated Notebook Platform for Interactive Training with On-Demand GPUs
DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads
Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context
LAIKA: Machine Learning-Assisted In-Kernel APU Acceleration
FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming Dataflow
Trinity: Three-Dimensional Tensor Program Optimization via Tile-level Equality Saturation
RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators
Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using F2
Streaming Tensor Program: A streaming abstraction for dynamic parallelism
AlphaSyndrome: Tackling the Syndrome Measurement Circuit Scheduling Problem for QEC Codes
PropHunt: Automated Optimization of Quantum Syndrome Measurement Circuits
iSwitch: QEC on Demand via In-Situ Encoding of Bare Qubits for Ion Trap Architectures
Architecting Scalable Trapped Ion Quantum Computers using Surface Codes
Accelerating Computation in Quantum LDPC Code
DARTH-PUM: A Hybrid Processing-Using-Memory Architecture
PUSHtap: PIM-based In-Memory HTAP with Unified Data Storage Format
CoGraf: Fully Accelerating Graph Applications with Fine-Grained PIM
Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference
A Cost-Effective Near-Storage Processing Solution for Offline Inference of Long-Context LLMs
10:35 – 10:55 AM EDT: Coffee Break
10:55 – 11:55 AM EDT: Keynote 2 by Ion Stoica (Univ. of California, Berkeley)
Abstract TBA
12:00 – 1:30 PM EDT: Lunch
12:00 – 9:00 PM EDT: Poster Session
1:30 – 3:35 PM EDT
TetriServe: Efficiently Serving Mixed DiT Workloads
Segment Only Where You Look: Leveraging Human Gaze Behavior for Efficient Computer Vision Applications in Augmented Reality
Compositional AI Beyond LLMs: System Implications of Neuro-Symbolic-Probabilistic Architectures
It Takes Two to Entangle
A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Neuralink: Fast on-Device LLM Inference with Neuron Co-Activation Linking
Lifetime-Aware Design of Item-Level Intelligence
FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
ASDR: Exploiting Adaptive Sampling and Data Reuse for CIM-based Instant Neural Rendering
BitRed: Taming Non-Uniform Bit-Level Sparsity with a Programmable RISC-V ISA for DNN Acceleration
Graphiti: Formally Verified Out-of-Order Execution in Dataflow Circuits
Highly Automated Verification of Security Properties for Unmodified System Software
SylQ-SV: Scaling Symbolic Execution of Hardware Designs with Query Caching
Once-for-All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized Generators
LPO: Discovering Missed Peephole Optimizations with Large Language Models
Nemo: A Low-Write-Amplification Cache for Tiny Objects on Log-Structured Flash Devices
ICARUS: Criticality and Reuse based Instruction Caching for Datacenter Applications
CacheMind: From Miss Rates to Why – Natural-Language, Trace-Grounded Reasoning for Cache Replacement
Toasty: Speeding up network I/O with cache-warm buffers
Hitchhike: Efficient Request Submission via Deferred Enforcement of Address Contiguity
3:35 – 3:55 PM EDT: Coffee Break
3:55 – 6:00 PM EDT
History Doesn’t Repeat Itself but Rollouts Rhyme: Accelerating Reinforcement Learning with HistoRL
Hardwired-Neuron Language Processing Units as General-Purpose Cognitive Substrates
Voyager: Input-Adaptive Algebraic Transformations for High-Performance Graph Neural Networks
CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems
Syno: Structured Synthesis for Neural Operators
Parameterized Hardware Design with Latency-Abstract Interfaces
Anvil: A General-Purpose Timing-Safe Hardware Description Language
Rage Against the State Machine: Type-Stated Hardware Peripherals for Increased Driver Correctness
RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation
Sequential Specifications for Precise Hardware Exceptions
TempGraph: An Efficient Chain-driven Temporal Graph Computing Framework on the GPU
Leveraging Sparsity to Accelerate Automata Processing
SLAWS: Spatial Locality Analysis and Workload Orchestration for Sparse Matrix Multiplication
Efficient Temporal Graph Network Training via Unified Redundancy Elimination
Understanding Query Optimization Bugs in Graph Database Systems
Efficient Remote Memory Ordering for Non-Coherent Systems
CPU-Oblivious Offloading of Failure-Atomic Transactions for Disaggregated Memory
PIPM: Partial and Incremental Page Migration for Multi-host CXL Disaggregated Shared Memory
CREST: High-Performance Contention Resolution for Disaggregated Transactions
Understanding and Optimizing Database Pushdown on Disaggregated Storage
6:00 – 9:00 PM EDT: Award Ceremony/Banquet
Day 3: Thursday, March 26
8:30 – 10:10 AM EDT
A Framework for Developing and Optimizing Fully Homomorphic Encryption Programs on GPUs
HEPIC: Private Inference over Homomorphic Encryption with Client Intervention
Falcon: Algorithm-Hardware Co-Design for Efficient Fully Homomorphic Encryption Accelerator
Maverick: Rethinking TFHE Bootstrapping on GPUs via Algorithm-Hardware Co-Design
COGENT: Adaptable Compiler Toolchain for Tagging RISC-V Binaries
Finding Reusable Instructions via E-Graph Anti-Unification
LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language Models
Evaluating Compiler Optimization Impacts on zkVM Performance
DejaVuzz: Disclosing Transient Execution Bugs with Dynamic Swappable Memory and Differential Information Flow Tracking assisted Processor Fuzzing
Signal Breaker: Fuzzing Digital Signal Processors
Scaling Automated Database System Testing
SEVI: Silent Data Corruption of Vector Instructions in Hyper-Scale Datacenters
Co-Exploration of RISC-V Processor Microarchitectures and FreeRTOS Extensions for Lower Context Switch Latency
Chips Need DIP: Time-Proportional Per-Instruction Cycle Stacks at Dispatch
Arm Weak Memory Consistency on Apple Silicon: What Is It Good For?
A Data-Driven Dynamic Execution Orchestration Architecture
10:10 – 10:30 AM EDT: Coffee Break
10:30 – 11:30 AM EDT: Keynote 3 by Hillery Hunter (IBM)
Abstract TBA
12:00 – 1:30 PM EDT: Lunch
1:30 – 3:10 PM EDT
cuJSON: A Highly Parallel JSON Parser for GPUs
CHERI-SIMT: Implementing Capability Memory Protection in GPGPUs
Lobster: A GPU-Accelerated Framework for Neurosymbolic Programming
ReliaFHE: Resilient Design for Fully Homomorphic Encryption Accelerators
Lambda-trim: Reducing Monetary and Performance Cost of Serverless Cold Starts with Cost-driven Application Debloating
Skyler: Static Analysis for Predicting API-Driven Costs in Serverless Applications
Enabling fast networking in the public cloud
SG-IOV: Socket-Granular I/O Virtualization for SmartNIC-Based Container Networks
PACT: A Criticality-First Design for Tiered Memory
CounterPoint: Using Hardware Event Counters to Refute and Refine Microarchitectural Assumptions
Performance Predictability in Heterogeneous Memory
PF-LLM: Large Language Model Hinted Hardware Prefetching
Neura: A Unified Framework for Hierarchical and Adaptive CGRAs
Transforming Torus Fabrics for Efficient Multi-tenant ML
The Configuration Wall: Characterization and Elimination of Accelerator Configuration Overhead
Static Analysis for Efficient Streaming Tokenization
3:10 – 3:30 PM EDT: Coffee Break
3:30 – 5:10 PM EDT
Arancini: A Hybrid Binary Translator for Weak Memory Model Architectures
Wax: Optimizing Data Center Applications With Stale Profile
M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization
Gouda: A Swift Fully Homomorphic Encryption Library Designed for GPU Architectures
COMPAS: A Distributed Multi-Party SWAP Test for Parallel Quantum Algorithms
TreeVQA: A Tree-Structured Execution Framework for Shot Reduction in Variational Quantum Algorithms
CHEHAB RL: Learning to Optimize Fully Homomorphic Encryption Computations
CEMU: Enabling Full-System Emulation of Computational Storage beyond Hardware Limits
Fault Escaping: Improving Robustness of DPU Enhanced Platform with Mutual Assisted VM Recovery
Shields Up! Software Radiation Protection for Commodity Hardware in Space
TierX: A Simulation Framework for Multi-tier BCI System Design Evaluation and Exploration
PrioriFI: More Informed Fault Injection for Edge Neural Networks
TiNA: Tiered Network Buffer Architecture for Fast Networking in Chiplet-based CPU
An MLIR Lowering Pipeline for Stencils at Wafer-Scale
JOSer: Just-In-Time Object Serialization for Heavy Java Serialization Workloads
Wave: Offloading Resource Management to SmartNIC Cores
5:15 – 5:30 PM EDT: Closing Remarks
The program page was generated and formatted using Professor Saugata Ghose‘s (UIUC) Conference Program Generator.

