Program

ASPLOS 2026 Program Overview

Pittsburgh, PA — March 24-26, 2026

Click here for the Detailed Program

Start: 8:30 AM | Lunch: 12:00 PM | End: 6:00 PM | Presentations: 25 min | 167 unique papers

Day 1: Tuesday, March 24

TimeTrack ATrack BTrack CTrack D
8:00 – 8:20 AMOpening Remarks
8:30 – 10:35 AM1A: LLM Serving: Throughput Optimization1B: LLM Serving: Latency & Scheduling1C: Quantum Computing: Compilation1D: CXL & Memory Fabric
10:35 – 10:55 AMCoffee Break
10:55 – 11:55 AMKeynote 1 — Partha Ranganthan (Google)
12:00 – 1:30 PMLunch
12:00 – 6:00 PMPoster Session
1:30 – 3:35 PM2A: LLM Training Systems2B: Speculative Decoding2C: GPU Systems & Scheduling2D: DRAM Reliability & Security
3:35 – 3:55 PMCoffee Break
3:55 – 6:00 PM3A: LLM Attention & KV Cache3B: Mixture-of-Experts & Efficient Inference3C: 3D Gaussian Splatting & Rendering3D: Trusted Execution Environments
6:30 – 7:30 PMWACI Session
6:30 – 8:30 PMBusiness Meeting

Day 2: Wednesday, March 25

TimeTrack ATrack BTrack CTrack D
8:30 – 10:35 AM4A: ML Training & Monitoring4B: ML Compilers & Tensor Programs4C: Quantum Error Correction4D: Processing-in-Memory
10:35 – 10:55 AMCoffee Break
10:55 – 11:55 AMKeynote 2 — Ion Stoica (UC Berkeley)
12:00 – 1:30 PMLunch
12:00 – 9:00 PMPoster Session
1:30 – 3:35 PM5A: Generative Model Serving5B: On-Device & Edge AI5C: Formal Verification5D: Storage & Caching
3:35 – 3:55 PMCoffee Break
3:55 – 6:00 PM6A: Neural Network Acceleration6B: Hardware Design Languages6C: Graph & Sparse Computing6D: Disaggregated Memory Systems
6:00 – 9:00 PMAward Ceremony/Banquet

Day 3: Thursday, March 26

TimeTrack ATrack BTrack CTrack D
8:30 – 10:10 AM7A: Fully Homomorphic Encryption7B: Compilers & Code Generation7C: Testing & Fuzzing7D: Processor Microarchitecture
10:10 – 10:30 AMCoffee Break
10:30 – 11:30 AMKeynote 3 — Hillery Hunter (IBM)
12:00 – 1:30 PMLunch
1:30 – 3:10 PM8A: GPU Programming8B: Serverless & Cloud Networking8C: Memory Hierarchy & Performance8D: Reconfigurable Architectures
3:10 – 3:30 PMCoffee Break
3:30 – 5:10 PM9A: Systems Profiling & Optimization9B: Quantum & Emerging Computing9C: Reliability & Fault Tolerance9D: Network & Cloud Infrastructure
5:15 – 5:30 PMClosing Remarks

ASPLOS 2026 Detailed Program



Day 1: Tuesday, March 24

8:00 – 8:20 AM EDT: Opening Remarks

8:30 – 10:35 AM EDT

Towards High-Goodput LLM Serving with Prefill-decode Multiplexing
Weihao Cui (Shanghai Jiao Tong University), Yukang Chen (Shanghai Jiao Tong University), Han Zhao (Shanghai Jiao Tong University), Ziyi Xu (Shanghai Jiao Tong University), Xiaoze Fan (Shanghai Jiao Tong University), Xusheng Chen (University of Hong Kong), Yangjie Zhou (National University of Singapore), Shixuan Sun (Shanghai Jiao Tong University), Bingsheng He (National University of Singapore), Quan Chen (Shanghai Jiao Tong University)

Bullet: Boosting GPU Utilization for LLM Serving via Dynamic Spatial-Temporal Orchestration
Zejia Lin (Sun Yat-sen University), Hongxin Xu (Sun Yat-sen University), Guanyi Chen (Sun Yat-sen University), Zhiguang Chen (Sun Yat-sen University), Yutong Lu (Sun Yat-sen University), Xianwei Zhang (Sun Yat-sen University)

QoServe: Breaking the Silos of LLM Inference Serving
Kanishk Goel (Microsoft Research India), Jayashree Mohan (Microsoft Research India), Nipun Kwatra (Microsoft Research India), Ravi Anupindi (Microsoft Research India), Ram Ramjee (Microsoft Research India)

Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
Mert Hidayetoglu (Snowflake), Aurick Qiao (Snowflake), Michael Wyatt (Snowflake), Jeff Rasley (Snowflake), Yuxiong He (Snowflake), Samyam Rajbhandari (Snowflake)

XY-Serve: End-to-End Versatile Production Serving for Dynamic LLM Workloads
Mingcong Song (Huawei Technologies Co., Ltd.), Xinru Tang (Tsinghua University, BNRist), Fengfan Hou (Huawei Technologies Co., Ltd.), Jing Li (Huawei Technologies Co., Ltd.), Wei Wei (Huawei Technologies Co., Ltd.), Yipeng Ma (Huawei Technologies Co., Ltd.), Runqiu Xiao (Huawei Technologies Co., Ltd.), Hongjie Si (Huawei Technologies Co., Ltd.), Dingcheng Jiang (Tsinghua University, BNRist), Shouyi Yin (Tsinghua University, BNRist and Shanghai AI Laboratory), Yang Hu (Tsinghua University, BNRist), Guoping Long (Huawei Technologies Co., Ltd.)
PAT: Accelerating LLM Decoding via Prefix-Aware Attention with Resource Efficient Multi-Tile Kernel
Jinjun Yi (Tianjin University), Zhixin Zhao (Tianjin University), Yitao Hu (Tianjin University), Ke Yan (Tianjin University), Weiwei Sun (Tianjin University), Hao Wang (Stevens Institue of Technology), Laiping Zhao (Tianjin University), Yuhao Zhang (Tianjin University), Wenxin Li (Tianjin University), Keqiu Li (Tianjin University)

ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression
Ruibo FAN (Data Science and Analytics Thrust, HKUST(GZ)), Xiangrui YU (Data Science and Analytics Thrust, HKUST(GZ)), Xinglin Pan (Data Science and Analytics Thrust, HKUST(GZ)), Zeyu Li (Data Science and Analytics Thrust, HKUST(GZ)), Weile Luo (Data Science and Analytics Thrust, HKUST(GZ)), QIANG WANG (Harbin Institute of Technology (Shenzhen)), Wei Wang (Hong Kong University of Science and Technology), Xiaowen Chu (Data Science and Analytics Thrust, HKUST(GZ))

BlendServe: Optimizing Offline Inference with Resource-Aware Batching
Yilong Zhao (University of California, Berkeley), Shuo Yang (University of California, Berkeley), Kan Zhu (University of Washington), Lianmin Zheng (UC Berkeley), Baris Kasikci (University of Washington), Yifan Qiao (UC Berkeley), Yang Zhou (UC Berkeley and UC Davis), Jiarong Xing (UC Berkeley and Rice University), Ion Stoica (UC Berkeley)

BAT: Efficient Generative Recommender Serving with Bipartite Attention
Jie Sun (Zhejiang University), Shaohang Wang (The University of Hong Kong), Zimo Zhang (Zhejiang University), Zhengyu Liu (Alibaba Group), Yunlong Xu (Alibaba Group), Peng Sun (Alibaba Group), Bingsheng He (National University of Singapore), Fei Wu (Zhejiang University), Zeke Wang (Zhejiang University), Bo Zhao (Aalto University)

MoE-APEX: An Efficient MoE Inference System with Adaptive Precision Expert Offloading
Peng Tang (Shanghai Jiaotong University), Jiacheng Liu (CUHK), Xiaofeng Hou (Shanghai Jiao Tong University), Yifei Pu (Shanghai Jiao Tong University), Jing Wang (Shanghai Jiao Tong University), Pheng-Ann Heng (CUHK), Chao Li (Shanghai Jiao Tong University), Minyi Guo (Shanghai Jiao Tong University)
PowerMove: Optimizing Compilation for Neutral Atom Quantum Computers with Zoned Architecture
Jixuan Ruan (University of California San Diego), Xiang Fang (University of California, San Diego), Hezi Zhang (University of California San Diego), Ang Li (PNNL and UW), Travis Humble (Quantum Science Center, Oak Ridge National Laboratory), Yufei Ding (UCSD)

Reconfigurable Quantum Instruction Set Computers for High Performance Attainable on Hardware
Zhaohui Yang (The Hong Kong University of Science and Technology), Dawei Ding (Tsinghua University), Qi Ye (Tsinghua University), Cupjin Huang (Alibaba Group USA), Jianxin Chen (Tsinghua University), Yuan Xie (The Hong Kong University of Science and Technology)

QTurbo: A Robust and Efficient Compiler for Analog Quantum Simulation
Junyu Zhou (University of Pennsylvania), Yuhao Liu (University of Pennsylvania), Shize Che (University of Pennsylvania), Anupam Mitra (Lawrence Berkeley National Laboratory), Efekan Kökcü (Lawrence Berkeley National Laboratory), Ermal Rrapaj (Lawrence Berkeley National Laboratory), Costin Iancu (Lawrence Berkeley National Laboratory), Gushu Li (University of Pennsylvania)

Reducing T Gates with Unitary Synthesis
Tianyi Hao (University of Wisconsin-Madison), Amanda Xu (University of Wisconsin-Madison), Swamit Tannu (University of Wisconsin-Madison)

Borrowing Dirty Qubits in Quantum Programs
Bonan Su (Department of Computer Science and Technology, Tsinghua University), Li Zhou (Institute of Software, Chinese Academy of Sciences), Yuan Feng (Tsinghua University), Mingsheng Ying (University of Technology, Sydney)
HybridTier: An Adaptive and Lightweight CXL-Memory Tiering System
Kevin Song (University of Toronto), Jiacheng Yang (University of Toronto), Zixuan Wang (University of California San Diego), Jishen Zhao (UCSD), Sihang Liu (Waterloo), Gennady Pekhimenko (Univ. of Toronto / CentML)

vCXLGen: Automated Synthesis and Verification of CXL Bridges for Heterogeneous Architectures
Anatole Lefort (Technical University of Munich), Julian Pritzi (Technical University of Munich), Nicolò Carpentieri (Technical University of Munich), David Schall (Technical University of Munich), Simon Dittrich (Technical University of Munich), Soham Chakraborty (TU Delft), Nicolai Oswald (NVIDIA), Pramod Bhatotia (Technical University of Munich)

CXLMC: Model Checking CXL Shared Memory Programs
Simon Guo (University of California, Irvine), Conan Truong (University of California, Irvine), Brian Demsky (University of California, Irvine)

A Programming Model for Disaggregated Memory over CXL
Gal Assa (Technion), Moritz Lumme (ETH Zürich), Lucas Bürgi (ETH Zürich), Michal Friedman (ETH Zürich), Ori Lahav (Tel Aviv University)

Cxlalloc: Safe and Efficient Memory Allocation for a CXL Pod
Newton Ni (UT Austin), Yan Sun (UIUC), Zhiting Zhu (UT Austin/NVIDIA), Emmett Witchel (UT Austin)

10:35 – 10:55 AM EDT: Coffee Break

10:55 – 11:55 AM EDT: Keynote 1 by Partha Ranganthan (Google)

Abstract TBA

12:00 – 1:30 PM EDT: Lunch

12:00 – 6:00 PM EDT: Poster Session

1:30 – 3:35 PM EDT

SNIP: An Adaptive Mixed Precision Framework for Subbyte Large Language Model Training
Yunjie Pan (University of Michigan), Jiecao Yu (Meta Platforms, Inc.), Yongyi Yang (University of Michigan, Ann Arbor), Hanmei Yang (University of Massachusetts Amherst), Zhaoxia (Summer) Deng (Meta), Scott Mahlke (University of Michigan)

Fine-grained and Non-intrusive LLM Training Monitoring via Microsecond-level Traffic Measurement
Yibo Xiao (Nanjing University), Hao Zheng (Nanjing University, China.), Haifeng Sun (National University of Singapore), Qingkai Meng (Nanjing University), Jiong Duan (Nanjing University), Xiaohe Hu (Infrawaves), Rong Gu (Nanjing University), Guihai Chen (Nanjing University), Chen Tian (Nanjing University)

SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
Xinyu Lian (University of Illinois Urbana Champaign), Masahiro Tanaka (Microsoft), Olatunji Ruwase (Snowflake), Minjia Zhang (University of Illinois Urbana Champaign)

DIP: Efficient Large Multimodal Model Training with Dynamic Interleaved Pipeline
Zhenliang Xue (Shanghai Jiao Tong University), Hanpeng Hu (StepFun), Xing Chen (StepFun), Yimin Jiang (StepFun), Yixin Song (Shanghai Jiao Tong University, Zenergize AI), Zeyu Mi (Shanghai Jiao Tong University, Zenergize AI), Yibo Zhu (StepFun), Daxin Jiang (StepFun), Yubin Xia (Shanghai Jiao Tong University), Haibo Chen (Shanghai JiaoTong University)

Dynamic Sparsity in Large-Scale Video DiT Training
Xin Tan (The Chinese University of Hong Kong), Yuetao Chen (The Chinese University of Hong Kong), Yimin Jiang (Unaffiliated), Xing Chen (StepFun), Kun Yan (StepFun), Nan Duan (StepFun), Yibo Zhu (StepFun), Daxin Jiang (StepFun), Hong Xu (The Chinese University of Hong Kong)
DFVG: A Heterogeneous Architecture for Speculative Decoding with Draft-on-FPGA and Verify-on-GPU
Shaoqiang Lu (Shanghai Jiao Tong University, Shanghai, China and Eastern Institute of Technology, Ningbo, China), Yangbo Wei (Shanghai Jiao Tong University, Shanghai, China and Eastern Institute of Technology, Ningbo, China), Junhong Qian (Southest University, Nanjin, China), Dongge Qin (Southest University, Nanjin, China), Shiji Gao (Southest University, Nanjin, China), Yizhi Ding (Southest University, Nanjin, China), Qifan Wang (Shanghai Jiao Tong University, Shanghai, China and Eastern Institute of Technology, Ningbo, China), Chen Wu (Ningbo Institute of Digital Twin,Eastern Institute of Technology , Ningbo, China), Xiao Shi (Southest University, Nanjin, China), Lei He (Eastern Institute of Technology, Ningbo, China)

SwiftSpec: Disaggregated Speculative Decoding and Fused Kernels for Low-Latency LLM Inference
Ziyi Zhang (ByteDance Seed & University of Chicago), Ziheng Jiang (ByteDance Seed), Chengquan Jiang (ByteDance Seed), Menghan Yu (ByteDance Seed), Size Zheng (ByteDance Seed), Haibin Lin (ByteDance Seed), Xin Liu (ByteDance Seed), Henry Hoffmann (University of Chicago)

SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs
Jiaming Xu (Shanghai Jiao Tong University; SII;), Jiayi Pan (Shanghai Jiao Tong University; Infinigence-AI;), Hanzhen Wang (Shanghai Jiao Tong University), Yongkang Zhou (Shanghai Jiao Tong University; SII;), Jiancai Ye (Shanghai Jiao Tong University), Yu Wang (Tsinghua University), Guohao Dai (Shanghai Jiao Tong University; Infinigence-AI; SII;)

SpecProto: A Parallelizing Compiler for Speculative Decoding of Large Protocol Buffers Data
Zhijie Wang (University of California, Riverside), Chales Hong (University of California, Riverside), Dhruv Parmar (University of California, Riverside), Shengbo Ma (University of California, Riverside), Zhijia Zhao (University of California, Riverside), Qidong Zhao (Google), Xu Liu (Google)

EARTH: An Efficient MoE Accelerator with Entropy-Aware Speculative Prefetch and Result Reuse
Fangxin Liu (Shanghai Jiao Tong University), Ning Yang (Shanghai Jiao Tong University), Jingkui Yang (National University of Defense Technology), Zongwu Wang (Shanghai Jiao Tong University), Chenyang Guan (Shanghai Jiao Tong University), Yu Feng (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiaotong University), Haibing Guan (Shanghai Jiao Tong University)
gShare: Efficient GPU Sharing with Aggressive Scheduling in Multi-tenant FaaS platform
Yanan Yang (China Telecom Cloud Computing Research Institute), Zhengxiong Jiang (China Telecom Cloud Technology Co. Ltd.), Meiqi Zhu (China Telecom Cloud Technology Co. Ltd.), Hongqiang Xu (China Telecom Cloud Technology Co. Ltd.), Yujun Wang (China Telecom Cloud Technology Co. Ltd.), Liang Li (China Telecom Cloud Computing Research Institute), Jiansong zhang (China Telecom Cloud Computing Research Institute), Jie Wu (China Telecom Cloud Computing Research Institute)

GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management
Jiaang Duan (Shanghai Jiao Tong University), Shenglin Xu (Shanghai Jiao Tong University), Shiyou Qian (Shanghai Jiao Tong University), Dingyu Yang (Zhejiang University), Kangjin Wang (Alibaba Group), Chenzhi Liao (Alibaba Group), Yinghao Yu (Alibaba Group), Qin Hua (Shanghai Jiao Tong Univeristy), Hanwen Hu (Shanghai Jiao Tong University), Qi Wang (Alibaba Group), Wenchao Wu (Alibaba Group), Dongqing Bao (Alibaba Group), Tianyu Lu (Alibaba Group), Jian Cao (Shanghai Jiao Tong University), Guangtao Xue (Shanghai Jiao Tong University), Guodong Yang (Alibaba Group), Liping Zhang (Alibaba Group), Gang Chen (Zhejiang University)

Asynchrony and GPUs: Bridging this Dichotomy for I/O with AGIO
Jihoon Han (Pennsylvania State University), Anand Sivasubramaniam (Pennsylvania State University), Chia-Hao Chang (Nvidia), Vikram Sharma Mailthody (Nvidia Research), Zaid Qureshi (Nvidia Research), Wen-mei W. Hwu (Nvidia Research)

MSCCL++: Rethinking GPU Communication Abstractions for AI Inference
Changho Hwang (Microsoft Research), Peng Cheng (Microsoft Research), Roshan Dathathri (Microsoft Research), Abhinav Jangda (Microsoft Research), Saeed Maleki (Microsoft Research), Madan Musuvathi (Microsoft Research), Olli Saarikivi (Microsoft Research), Aashaka Shah (Microsoft Research), Ziyue Yang (Microsoft Research), Binyang Li (Microsoft Azure), Caio Rocha (Microsoft Azure), Qinghua Zhou (Microsoft Azure), Mahdieh Ghazimirsaeed (Microsoft Azure), Sreevatsa Anantharamu (Microsoft Azure), Jithin Jose (Microsoft Azure)

Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums
Jaeyeon Won (MIT), Willow Ahrens (Georgia Tech), Joel Emer (MIT/NVIDIA), Saman Amarasinghe (MIT)
RowArmor: Efficient and Comprehensive Protection Against DRAM Disturbance Errors
Minbok Wi (Seoul National University), Yoonyul Yoo (Samsung Electronics), Yoojin Kim (Sungkyunkwan University), Jaeho Shin (Sungkyunkwan University), Jumin Kim (Seoul National University), Yesin Ryu (Sungkyunkwan University, Samsung Electronics), Saeid Gorgin (Sungkyunkwan University), Jung Ho Ahn (Seoul National University), Jungrae Kim (Sungkyunkwan University)

APT: Securing Against DRAM Read Disturbance via Adaptive Probabilistic In-DRAM Trackers
Runjin Wu (School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, China), Meng Zhang (School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, China), You Zhou (School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, China), Changsheng Xie (Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China), Fei Wu (Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China)

STRAW: Stress-Aware WL-Based Read Disturbance Management for High-Density NAND Flash Memory
Myoungjun Chun (Soongsil University), Jaeyong Lee (Seoul National University), Inhyuk Choi (Seoul National University), Jisung Park (POSTECH (Pohang University of Science and Technology)), Myungsuk Kim (Kyungpook National University), Jihong Kim (Seoul National University)

Trust-V: Toward Secure and Reliable Storage for Trusted Execution Environments
SeungKyun Han (Chungnam National University), Jiyeon Yang (Chungnam National University), Jinsoo Jang (Chungnam National University)

Optimizer-Friendly Instrumentation for Event Quantification with PRUE Algorithm
Hao Ling (The Hong Kong University of Science and Technology), Yiyuan Guo (The Hong Kong University of Science and Technology), Charles Zhang (The Hong Kong University of Science and Technology)

3:35 – 3:55 PM EDT: Coffee Break

3:55 – 6:00 PM EDT

I/O Analysis is All You Need: An I/O Analysis for Long-Sequence Attention
Xiaoyang Lu (Illinois Institute of Technology), Boyu Long (Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences), Xiaoming Chen (Institute of Computing Technology, Chinese Academy of Sciences), Yinhe Han (Institute of Computing Technology, Chinese Academy of Sciences), Xian-He Sun (Illinois Institute of Technology)

REPA: Reconfigurable PIM for the Joint Acceleration of KV Cache Offloading and Processing
Yang Hong (Shanghai Jiao Tong University), Junlong Yang (Shanghai Jiao Tong University), Bo Peng (Shanghai Jiao Tong University), Jianguo Yao (Shanghai Jiao Tong University)

STARC: Selective Token Access with Remapping and Clustering for Efficient LLM Decoding on PIM Systems
Zehao Fan (Rensselaer Polytechnic Institute), Yunzhen Liu (University of Massachusetts, Amherst), Garrett Gagnon (Rensselaer Polytechnic Institute), Zhenyu Liu (Rensselaer Polytechnic Institute), Yayue Hou (Rensselaer Polytechnic Institute), Hadjer Benmeziane (IBM Research – Ruschlikon, Switzerland), Kaoutar El Maghraoui (IBM T. J. Watson Research Center), Liu Liu (Rensselaer Polytechnic Institute)

Mugi: Value Level Parallelism For Efficient LLMs
Daniel Price (University of Central Florida), Prabhu Vellaisamy (Carnegie Mellon University), John Paul Shen (Carnegie Mellon University), Di Wu (University of Central Florida)

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill & Decode Inference
Xiaojuan Tang (Peking University), Fanxu Meng (Peking University), Pingzhi Tang (Peking University), Yuxuan Wang (Peking University), Di Yin (Tencent YouTu Lab), Xing Sun (Tencent YouTu Lab), Muhan Zhang (Peking University)
LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training
Xinyi Liu (Peking University), Yujie Wang (Peking University), Fangcheng Fu (Shanghai Jiao Tong University), Xuefeng Xiao (Bytedance Seed), Huixia Li (Bytedance Seed), Jiashi Li (Bytedance Seed), Bin Cui (Peking University)

oFFN: Outlier and Neuron-aware Structured FFN for Fast yet Accurate LLM Inference
Geunsoo Song (Sogang University), Hoeseok Yang (Santa Clara University), Youngmin Yi (Sogang University)

MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models
Yuchen Xia (University of Michigan), Divyam Sharma (University of Michigan), Yichao Yuan (University of Michigan), Souvik Kundu (Intel Labs), Nishil Talati (University of Michigan)

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Qinghao Hu (MIT), Shang Yang (MIT), Junxian Guo (MIT), Xiaozhe Yao (ETH Zurich), Yujun Lin (NVIDIA), Yuxian Gu (NVIDIA), Han Cai (NVIDIA), Chuang Gan (MIT-IBM Watson AI Lab, UMass Amherst), Ana Klimovic (ETH Zurich), Song Han (MIT)

FastTTS: Accelerating Test-Time Scaling for Edge LLM Reasoning
Hao (Mark) Chen (Imperial College London), Zhiwen Mo (Imperial College London), Guanxi Lu (Imperial College London), Shuang Liang (Imperial College London), Lingxiao Ma (Microsoft Research), Wayne Luk (Imperial College London), Hongxiang Fan (Imperial College London)
GS-Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading
Donghyun Lee (Seoul National University), Dawoon Jeong (Seoul National University), Jae W. Lee (Seoul National University), Hongil Yoon (Google)

Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration
Changhun Oh (KAIST), Seongryong Oh (KAIST), Jinwoo Hwang (KAIST), Yoonsung Kim (KAIST), Hardik Sharma (Meta), Jongse Park (KAIST)

CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting
Hexu Zhao (NYU), Xiwen Min (NYU), Xiaoteng Liu (NYU), Moonjun Gong (NYU), Yiming Li (NYU), Ang Li (PNNL and UW), Saining Xie (NYU), Jinyang Li (NYU), Aurojit Panda (NYU)

Nebula: Infinite-Scale 3D Gaussian Splatting in VR via Collaborative Rendering and Accelerated Stereo Rasterization
He Zhu (Shanghai Jiao Tong University), Zheng Liu (Shanghai Jiao Tong University), Xingyang Li (Shanghai Jiao Tong University), Anbang Wu (Shanghai Jiao Tong University), Jieru Zhao (Department of Computer Science and Engineering, Shanghai Jiao Tong University), Fangxin Liu (Shanghai Jiao Tong University), Yiming Gan (ICT, Chinese Academy of Sciences), Jingwen Leng (Shanghai Jiao Tong University), Yu Feng (Shanghai Jiao Tong University)

AGS: Accelerating 3D Gaussian Splatting SLAM via CODEC-Assisted Frame Covisibility Detection
Houshu He (Shanghai Jiao Tong University), Naifeng Jing (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiao Tong University), Xiaoyao Liang (Shanghai Jiao Tong University), Zhuoran Song (Shanghai Jiao Tong University)
Detecting Inconsistencies in ARM CCA’s Formally Verified Specification
Changho Choi (Samsung Research), Xiang Cheng (Georgia Institute of Technology), Bokdeuk Jeong (Samsung Research), Taesoo Kim (Georgia Institute of Technology, Samsung Research)

WorksetEnclave: Towards Optimizing Cold Starts in Confidential Serverless with Workset-Based Enclave Restore
Xiaolong Yan (Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences), Qihang Zhou (Institute of Information Engineering, Chinese Academy of Sciences), Zisen Wan (Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences), Feifan Qian (Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences), Wentao Yao (Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences), Weijuan Zhang (Institute of Information Engineering, Chinese Academy of Sciences), Xiaoqi Jia (Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences)

TEEM³: Core-Independent and Cooperating Trusted Execution Environments
Nils Asmussen (Barkhausen Institut), Sebastian Haas (Barkhausen Institut), Carsten Weinhold (Barkhausen Institut), Nicholas Gordon (Barkhausen Institut), Stephan Gerhold (TU Dresden), Friedrich Pauls (Barkhausen Institut), Nilanjana Das (Barkhausen Institut), Michael Roitzsch (Barkhausen Institut)

WAVE: Leveraging Architecture Observation for Privacy-Preserving Model Oversight
Haoxuan Xu (USC), Chen Gong (USC), Beijie Liu (USC), Haizhong Zheng (Carnegie Mellon University), Beidi Chen (Carnegie Mellon University), Mengyuan Li (USC)

Compass: Navigating the Design Space of Taint Schemes for RTL Security Verification
Yuheng Yang (MIT), Qinhan Tan (Princeton University), Thomas Bourgeat (EPFL), Sharad Malik (Princeton University), Mengjia Yan (MIT)

6:30 – 7:30 PM EDT: WACI Session

6:30 – 8:30 PM EDT: Business Meeting



Day 2: Wednesday, March 25

8:30 – 10:35 AM EDT

T-Control: An Efficient Dynamic Tensor Rematerialization System for DNN Training
Zehua Wang (Institute of Computing Technology, Chinese Academy of Sciences), Junmin Xiao (Institute of Computing Technology, Chinese Academy of Sciences), Xiaochuan Deng (Institute of Computing Technology, Chinese Academy of Sciences), Huibing Wang (Institute of Computing Technology, Chinese Academy of Sciences), Hui Ma (Institute of Computing Technology, Chinese Academy of Sciences), Mingyi Li (Institute of Computing Technology, Chinese Academy of Sciences), Yunfei Pang (Institute of Computing Technology, Chinese Academy of Sciences), Guangming Tan (Institute of Computing Technology, Chinese Academy of Sciences)

NotebookOS: A Replicated Notebook Platform for Interactive Training with On-Demand GPUs
Benjamin Carver (George Mason University), Jingyuan Zhang (George Mason University), Haoliang Wang (Adobe Research), Kanak Mahadik (Adobe Research), Yue Cheng (University of Virginia)

DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads
Qidong Zhao (North Carolina State University), Hao Wu (George Mason University), Yueming Hao (North Carolina State University), Zilingfeng Ye, Jiajia Li (North Carolina State University), Xu Liu (North Carolina State University), Keren Zhou (George Mason University)

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context
Hao Wu (George Mason University), Qidong Zhao (Google), Songqing Chen (George Mason University), Yang Chen (Meta), Yueming Hao (Meta), Tony CW Liu (Anthropic), Sijia Chen (Meta), Adnan Aziz (Meta), Keren Zhou (George Mason University)

LAIKA: Machine Learning-Assisted In-Kernel APU Acceleration
Haoming Zhuo (South China Normal University), Dingding Li (South China Normal University), Ronghua Lin (South China Normal University), Yong Tang (South China Normal University)
FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming Dataflow
Rubens Lacouture (Stanford University), Olivia Hsu (Stanford University), Nathan Zhang (Stanford University), Ritvik Sharma (Stanford University), Marco Siracusa (Stanford University), Fredrik Kjolstad (Stanford University), Kunle Olukotun (Stanford University)

Trinity: Three-Dimensional Tensor Program Optimization via Tile-level Equality Saturation
Jaehyeong Park (KAIST), Youngchan Kim (KAIST), Haechan An (KAIST), Gieun Jeong (KAIST), Jeehoon Kang (FuriosaAI), Dongsu Han (KAIST)

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators
Xinsheng Tang (Alibaba Cloud Computing), Yangcheng Li (Alibaba Cloud Computing), Nan Wang (Alibaba Cloud Computing), Zhiyi Shu (Alibaba Cloud Computing), Xingyu Ling (Alibaba Cloud Computing), Junna Xing (Alibaba Cloud Computing), Peng Zhou (Alibaba Cloud Computing), Qiang Liu (Alibaba Cloud Computing)

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using F2
Keren Zhou (George Mason University and OpenAI), Mario Lezcano (OpenAI), Adam Goucher (OpenAI), Akhmed Rakhmati (OpenAI), Jeff Niu (OpenAI), Justin Lebar (OpenAI), Pawel Szczerbuk (OpenAI), Peter Bell (OpenAI), Phil Tillet (OpenAI), Thomas Raoux (OpenAI), Zahi Moudallal (OpenAI)

Streaming Tensor Program: A streaming abstraction for dynamic parallelism
Gina Sohn (Stanford University), Genghan Zhang (Stanford University), Konstantin Hossfeld (Stanford University), Jungwoo Kim (Stanford University), Nathan Sobotka (Stanford University), Nathan Zhang (SambaNova Systems), Olivia Hsu (Stanford University), Kunle Olukotun (Stanford University)
AlphaSyndrome: Tackling the Syndrome Measurement Circuit Scheduling Problem for QEC Codes
Yuhao Liu (University of Pennsylvania), Shuohao Ping (University of Pennsylvania), Junyu Zhou (University of Pennsylvania), Ethan Decker (University of Pennsylvania), Justin Kalloor (University of California, Berkeley), Mathias Weiden (University of California, Berkeley), Kean Chen (University of Pennsylvania), Yunong Shi (Amazon Quantum Technologies), Ali Javadi-Abhari (IBM Research), Costin Iancu (Lawrence Berkeley National Laboratory), Gushu Li (University of Pennsylvania)

PropHunt: Automated Optimization of Quantum Syndrome Measurement Circuits
Joshua Viszlai (University of Chicago), Satvik Maurya (University of Wisconsin-Madison), Swamit Tannu (University of Wisconsin-Madison), Margaret R. Martonosi (Princeton University), Fred Chong (University of Chicago)

iSwitch: QEC on Demand via In-Situ Encoding of Bare Qubits for Ion Trap Architectures
Keyi Yin (University of California, San Diego), Xiang Fang (University of California, San Diego), Zhuo Chen (University of California, San Diego), David Hayes (Quantinuum), Eneet Kaur (Cisco Quantum Lab), Reza Nejabati (Cisco Quantum Lab), Hartmut Haeffner (University of California, Berkeley), Wes Campbell (University of California, Los Angeles), Eric Hudson (University of California, Los Angeles), Jens Palsberg (University of California, Los Angeles), Travis Humble (Oak Ridge National Laboratory), Yufei Ding (University of California, San Diego)

Architecting Scalable Trapped Ion Quantum Computers using Surface Codes
Scott Jones (University of Cambridge), Prakash Murali (University of Cambridge)

Accelerating Computation in Quantum LDPC Code
Jungmin Cho (Seoul National University), Hyeonseong Jeong (Seoul National University), Junpyo Kim (Seoul National University), Junhyuk Choi (Seoul National University), Juwon Hong (Seoul National University), Jangwoo Kim (Seoul National University / MangoBoost)
DARTH-PUM: A Hybrid Processing-Using-Memory Architecture
Ryan Wong (University of Illinois Urbana-Champaign), Ben Feinberg (Sandia National Laboratories), Saugata Ghose (University of Illinois Urbana-Champaign)

PUSHtap: PIM-based In-Memory HTAP with Unified Data Storage Format
Yilong Zhao (Shanghai Jiao Tong University), Mingyu Gao (Tsinghua University), Huanchen Zhang (Tsinghua University), Fangxin Liu (Shanghai Jiao Tong University), Gongye Chen (Shanghai Jiao Tong University), He Xian (Shanghai Qi Zhi Institute), Li Jiang (Shanghai Jiao Tong University)

CoGraf: Fully Accelerating Graph Applications with Fine-Grained PIM
Mehmetali Semi Yenimol (Uppsala University), Anirban Nag (Huawei), Chang Hyun Park (Uppsala University), David Black-Schaffer (Uppsala University)

Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference
Yiqi Liu (University of Chinese Academy of Sciences), Yudong Pan (University of Chinese Academy of Sciences), Mengdi Wang (Institute of Computing Technology, Chinese Academy of Sciences), Shixin Zhao (University of Chinese Academy of Sciences), Haonan Zhu (University of Chinese Academy of Sciences), Yinhe Han (Institute of Computing Technology, Chinese Academy of Sciences), Lei Zhang (Institute of Computing Technology, Chinese Academy of Sciences), Ying Wang (Institute of Computing Technology, Chinese Academy of Sciences)

A Cost-Effective Near-Storage Processing Solution for Offline Inference of Long-Context LLMs
Hongsun Jang (Seoul National University), Jaeyong Song (Seoul National University), Changmin Shin (Seoul National University), Si Ung Noh (Seoul National University), Jaewon Jung (Seoul National University), Jisung Park (POSTECH (Pohang University of Science and Technology)), Jinho Lee (Seoul National University)

10:35 – 10:55 AM EDT: Coffee Break

10:55 – 11:55 AM EDT: Keynote 2 by Ion Stoica (Univ. of California, Berkeley)

Abstract TBA

12:00 – 1:30 PM EDT: Lunch

12:00 – 9:00 PM EDT: Poster Session

1:30 – 3:35 PM EDT

TetriServe: Efficiently Serving Mixed DiT Workloads
Runyu Lu (University of Michigan), Shiqi He (University of Michigan), Wenxuan Tan (UW-Madison), Shenggui Li (Nanyang Technological University), Ruofan Wu (University of Michigan), Jeff J. Ma (University of Michigan), Ang Chen (University of Michigan), Mosharaf Chowdhury (University of Michigan)

Segment Only Where You Look: Leveraging Human Gaze Behavior for Efficient Computer Vision Applications in Augmented Reality
Tianhua Xia (New York University), Haiyu Wang (New York University), Sai Qian Zhang (New York University)

Compositional AI Beyond LLMs: System Implications of Neuro-Symbolic-Probabilistic Architectures
Zishen Wan (Georgia Tech), Hanchen Yang (Georgia Tech), Jiayi Qian (Georgia Tech), Ritik Raj (Georgia Tech), Joongun Park (Georgia Tech), Chenyu Wang (Princeton University), Arijit Raychowdhury (Georgia Tech), Tushar Krishna (Georgia Tech)

It Takes Two to Entangle
Zhanghan Wang (New York University), Ding Ding (New York University), Hang Zhu (ByteDance Inc.), Haibin Lin (ByteDance Inc.), Aurojit Panda (New York University)

A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Yaoyao Ding (Univerisity of Toronto / CentML), Bohan Hou (Carnegie Mellon University), Xiao Zhang (University of Toronto), Allan Lin (University of Waterloo), Tianqi Chen (Carnegie Mellon University), Cody Yu Hao (Anyscale), Yida Wang (Amazon), Gennady Pekhimenko (Univerisity of Toronto / CentML)
Neuralink: Fast on-Device LLM Inference with Neuron Co-Activation Linking
Tuowei Wang (Tsinghua University), Ruwen Fan (Tsinghua University), Minxing Huang (Tianjin University), Zixu Hao (Tsinghua University), Kun Li (Microsoft Research), Ting Cao (Microsoft Research), Youyou Lu (Tsinghua University), Yaoxue Zhang (Tsinghua University), Ju Ren (Tsinghua University)

Lifetime-Aware Design of Item-Level Intelligence
Shvetank Prakash (Harvard University), Andrew Cheng (Harvard University), Olof Kindgren (Qamcom), Ashiq Ahamed (Pragmatic Semiconductor), Graham Knight (Pragmatic Semiconductor), Jed Kufel (Pragmatic Semiconductor), Francisco Rodriguez (Pragmatic Semiconductor), Arya Tschand (Harvard University), David Kong (Harvard University), Mariam Elgamal (Harvard University), Jerry Huang (Harvard University), Emma Chen (Harvard University), Gage Hills (Harvard University), Richard Price (Pragmatic Semiconductor), Emre Ozer (Pragmatic Semiconductor), Vijay Janapa Reddi (Harvard University)

FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
Zhihao Shu (University of Georgia), Md Musfiqur Rahman Sanim (University of Georgia), Hangyu Zheng (University of Georgia), Kunxiong Zhu (University of Georgia), Miao Yin (University of Texas at Arlington), Gagan Agrawal (University of Georgia), Wei Niu (University of Georgia)

ASDR: Exploiting Adaptive Sampling and Data Reuse for CIM-based Instant Neural Rendering
Fangxin Liu (Shanghai Jiao Tong University), Haomin Li (Shanghai Jiao Tong University), Bowen Zhu (Shanghai Jiao Tong University), Zongwu Wang (Shanghai Jiao Tong University), Zhuoran Song (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiaotong University)

BitRed: Taming Non-Uniform Bit-Level Sparsity with a Programmable RISC-V ISA for DNN Acceleration
yanhuan liu (ICT), wenming li (ICT), kunming zhang (ICT), yuqun liu (ICT), siao wen (SmarCo Co., Ltd), lexin wang (ICT), tianyu liu (ICT), haibin wu (ICT), zhihua fan (ICT), xiaochun ye (ICT), dongrui fan (SmarCo Co., Ltd), xuejun an (ICT)
Graphiti: Formally Verified Out-of-Order Execution in Dataflow Circuits
Yann Herklotz (EPFL), Ayatallah Elakhras (EPFL), Martina Camaioni (EPFL), Paolo Ienne (EPFL), Lana Josipovic (ETH Zurich), Thomas Bourgeat (EPFL)

Highly Automated Verification of Security Properties for Unmodified System Software
Ganxiang Yang (Columbia University), Wei Qiang (Columbia University), Yi Rong (Columbia University), Xuheng Li (Columbia University), Fanqi Yu (Columbia University), Jason Nieh (Columbia University), Ronghui Gu (Columbia University)

SylQ-SV: Scaling Symbolic Execution of Hardware Designs with Query Caching
Kaki Ryan (University of North Carolina at Chapel Hill), Cynthia Sturton (University of North Carolina at Chapel Hill)

Once-for-All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized Generators
Maolin Sun (Nanjing University), Yibiao Yang (Nanjing University), Yuming Zhou (Nanjing University)

LPO: Discovering Missed Peephole Optimizations with Large Language Models
Zhenyang Xu (University of Waterloo), Hongxu Xu (University of Waterloo), Yongqiang Tian (Monash University), Xintong Zhou (University of Waterloo), Chengnian Sun (University of Waterloo)
Nemo: A Low-Write-Amplification Cache for Tiny Objects on Log-Structured Flash Devices
Xufeng Yang (Xiamen University), Tingting Tan (Xiamen University), Jingxin Hu (Chongqing University of Posts and Telecommunications), Congming Gao (Xiamen University), Mingyang Liu (Xiamen University), Tianyang Jiang (Huawei Technologies Co., Ltd.), Jian Chen (Tsinghua University), Linbo Long (Chongqing University of Posts and Telecommunications), Yina Lv (Xiamen University), Jiwu Shu (Xiamen University)

ICARUS: Criticality and Reuse based Instruction Caching for Datacenter Applications
Vedant Kalbande (Indian Institute of Technology Bombay), Hrishikesh Jedhe Deshmukh (Indian Institute of Technology Bombay), Alberto Ros (University of Murcia), Biswabandan Panda (Indian Institute of Technology Bombay)

CacheMind: From Miss Rates to Why – Natural-Language, Trace-Grounded Reasoning for Cache Replacement
Kaushal Mhapsekar (NC State University), Azam Ghanbari (Department of ECE, NC State University), Bita Aslrousta (North Carolina State University), Samira Mirbagher (NC State University)

Toasty: Speeding up network I/O with cache-warm buffers
Preeti Preeti (IIT Bombay), Nitish Bhat (IIT Bombay), Ashwin Kumar (IIT Bombay), Mythili Vutukuru (Indian Institute of Technology, Bombay)

Hitchhike: Efficient Request Submission via Deferred Enforcement of Address Contiguity
xuda zheng (Huazhong University of Science and Technology), Jian Zhou (Huazhong University of Science and Technology), Shuhan Bai (Huazhong University of Science and Technology), Runjin Wu (Huazhong University of Science and Technology), Xianlin Tang (Huazhong University of Science and Technology), Zhiyuan Li (Huazhong University of Science and Technology), Hong Jiang (UT Arlington), Fei Wu (Huazhong University of Science and Technology)

3:35 – 3:55 PM EDT: Coffee Break

3:55 – 6:00 PM EDT

History Doesn’t Repeat Itself but Rollouts Rhyme: Accelerating Reinforcement Learning with HistoRL
Jingkai He (Shanghai Jiao Tong University), Tianjian Li (ByteDance), Erhu Feng (Shanghai Jiao Tong University), Dong Du (Shanghai Jiao Tong University), Qian Liu (ByteDance), Tao Liu (ByteDance), Yubin Xia (Shanghai Jiao Tong University), Haibo Chen (Shanghai Jiao Tong University)

Hardwired-Neuron Language Processing Units as General-Purpose Cognitive Substrates
Yang Liu (Institute of Computing Technology, Chinese Academy of Sciences), Yi Chen (University of Science and Technology of China), Yongwei Zhao (Institute of Computing Technology, Chinese Academy of Sciences), Yifan Hao (Institute of Computing Technology, Chinese Academy of Sciences), Zifu Zheng (Institute of Computing Technology, Chinese Academy of Sciences), Weihao Kong (Institute of Computing Technology, Chinese Academy of Sciences), Zhangmai Li (University of Science and Technology of China), Dongchen Jiang (Institute of Computing Technology, Chinese Academy of Sciences), Ruiyang Xia (Institute of Computing Technology, Chinese Academy of Sciences), Zhihong Ma (Institute of Computing Technology, Chinese Academy of Sciences), Zisheng Liu (Institute of Computing Technology, Chinese Academy of Sciences), Zhaoyong Wan (Institute of Computing Technology, Chinese Academy of Sciences), Yunqi Lu (Institute of Computing Technology, Chinese Academy of Sciences), Ximing Liu (Institute of Computing Technology, Chinese Academy of Sciences), Hongrui Guo (Institute of Computing Technology, Chinese Academy of Sciences), Zhihao Yang (Institute of Software, Chinese Academy of Sciences), Zhe Wang (Institute of Computing Technology, Chinese Academy of Sciences), Tianrui Ma (Institute of Computing Technology, Chinese Academy of Sciences), Mo Zou (Institute of Computing Technology, Chinese Academy of Sciences), Rui Zhang (Institute of Computing Technology, Chinese Academy of Sciences), Ling Li (Institute of Software, Chinese Academy of Sciences), Xing Hu (Institute of Computing Technology, Chinese Academy of Sciences), Zidong Du (Institute of Computing Technology, Chinese Academy of Sciences), Zhiwei Xu (Institute of Computing Technology, Chinese Academy of Sciences), Qi Guo (Institute of Computing Technology, Chinese Academy of Sciences), Tiashi Chen (Cambricon Technologies), Yunji Chen (Institute of Computing Technology, Chinese Academy of Sciences)

Voyager: Input-Adaptive Algebraic Transformations for High-Performance Graph Neural Networks
Yangjie Zhou (Shanghai Jiao Tong University), Wenting Shen (Alibaba Group), Jingwen Leng (Shanghai Jiao Tong University), Shuwen Lu (Shanghai Jiao Tong University), Zihan Liu (Shanghai Jiao Tong University), Weihao Cui (Shanghai Jiao Tong University), Zhendong Zhang (Shanghai Qi Zhi Institute), Wencong Xiao (Alibaba Group), Baole Ai (Alibaba Group), Wei Lin (Alibaba Group), Deze Zeng (China University of Geosciences), Yun Liang (Peking University), Quan Chen (Shanghai Jiao Tong University), Ning Liu (Shanghai Jiao Tong University), Minyi Guo (Shanghai Jiao Tong University)

CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems
Tong Xie (Peking University), Yijiahao Qi (Peking University), Jinqi Wen (Peking University), Zishen Wan (Georgia Tech), Yanchi Dong (Peking University), Zihao Wang (Peking University), Shaofei Cai (Peking University), Yitao Liang (Peking University), Tianyu Jia (Peking University), Yuan Wang (Peking University), Runsheng Wang (Peking University), Meng Li (Peking University)

Syno: Structured Synthesis for Neural Operators
Yongqi Zhuo (Tsinghua University), Zhengyuan Su (Tsinghua University), Chenggang Zhao (Tsinghua University), Mingyu Gao (Tsinghua University)
Parameterized Hardware Design with Latency-Abstract Interfaces
Rachit Nigam (Massachusetts Institute of Technology), Edmund Lam (Cornell University), Ethan Gabizon (Cornell University), Carolyn Zech (Massachusetts Institute of Technology), Jonathan Balkind (UC Santa Barbara), Adrian Sampson (Cornell University)

Anvil: A General-Purpose Timing-Safe Hardware Description Language
Jason Zhijingcheng Yu (National University of Singapore), Aditya Ranjan Jha (National University of Singapore), Umang Mathur (National University of Singapore), Trevor E. Carlson (National University of Singapore), Prateek Saxena (National University of Singapore)

Rage Against the State Machine: Type-Stated Hardware Peripherals for Increased Driver Correctness
Tyler Potyondy (UC San Diego), Anthony Tarbinian (UC San Diego), Leon Schuermann (Princeton University), Eric Mugnier (UC San Diego), Adin Ackerman (UC San Diego), Amit Levy (Princeton University), Pat Pannuto (UC San Diego)

RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation
Yan Zhu (University of California, Berkeley), Boru Chen (University of California, Berkeley), Christopher W. Fletcher (University of California, Berkeley), Nandeeka Nayak (University of California, Berkeley)

Sequential Specifications for Precise Hardware Exceptions
Yulun Yao (Cornell University), Drew Zagieboylo (NVIDIA), G. Edward Suh (Cornell University / NVIDIA), Andrew Myers (Cornell University)
TempGraph: An Efficient Chain-driven Temporal Graph Computing Framework on the GPU
Jin Zhao (Huazhong University of Science and Technology), Qian Wang (Huazhong University of Science and Technology), Ligang He (University of Warwick), Yu Zhang (Huazhong University of Science and Technology), Sheng Di (Argonne National Laboratory), Bingsheng He (National University of Singapore), Xinlei Wang (Huazhong University of Science and Technology), Hui Yu (Huazhong University of Science and Technology), Hao Qi (Huazhong University of Science and Technology), Longlong Lin (Southwest University), Linchen Yu (Huazhong University of Science and Technology), Xiaofei Liao (Huazhong University of Science and Technology), Hai Jin (Huazhong University of Science and Technology)

Leveraging Sparsity to Accelerate Automata Processing
Xingran Du (MIT), Joel S. Emer (MIT/NVIDIA), Daniel Sanchez (MIT)

SLAWS: Spatial Locality Analysis and Workload Orchestration for Sparse Matrix Multiplication
Guoyu Li (Fudan University), Zheng Guan (Fudan University), Beichen Zhang (Fudan University), Jun Yu (Fudan University), Kun Wang (Fudan University)

Efficient Temporal Graph Network Training via Unified Redundancy Elimination
Yiqing Wang (Beihang University), Hailong Yang (Beihang University), Kejie Ma (Beihang University), Enze Yu (Beihang University), Pengbo Wang (Beihang University), Xin You (Beihang University), Qingxiao Sun (Beihang University), Chenhao Xie (Beihang University), Zhongzhi Luan (Beihang University), Yi Liu (Beihang University), Depei Qian (Beihang University)

Understanding Query Optimization Bugs in Graph Database Systems
Zhongxing Yu (Shandong University), Yuyu Chen (Shandong University)
Efficient Remote Memory Ordering for Non-Coherent Systems
Wei Siew Liew (University of Utah), Md Ashfaqur Rahaman (University of Utah), Adarsh Patil (Arm), Ryan Stutsman (University of Utah), Vijay Nagarajan (University of Utah)

CPU-Oblivious Offloading of Failure-Atomic Transactions for Disaggregated Memory
Cheng Chen (Huazhong University of Science and Technology), Chencheng Ye (Huazhong University of Science and Technology), Yuanchao Xu (University of California, Santa Cruz), Xipeng Shen (North Carolina State University), Xiaofei Liao (Huazhong University of Science and Technology), Hai Jin (Huazhong University of Science and Technology), Wenbin Jiang (Huazhong University of Science and Technology), Yan Solihin (U. of Central Florida)

PIPM: Partial and Incremental Page Migration for Multi-host CXL Disaggregated Shared Memory
Gangqi Huang (University of California, Santa Cruz), Heiner Litz (University of California, Santa Cruz), Yuanchao Xu (University of California, Santa Cruz)

CREST: High-Performance Contention Resolution for Disaggregated Transactions
Qihan Kang (Institute of Computing Technology, Chinese Academy of Sciences), Mi Zhang (Institute of Computing Technology, Chinese Academy of Sciences), Patrick P. C. Lee (The Chinese University of Hong Kong), Yongkang Hu (Institute of Computing Technology, Chinese Academy of Sciences)

Understanding and Optimizing Database Pushdown on Disaggregated Storage
Hua Zhang (University of Wisconsin at Madison/Beihang University), Xiao Li (University of Wisconsin at Madison/Beihang University), Yuebin Bai (Beihang University), Ming Liu (University of Wisconsin–Madison)

6:00 – 9:00 PM EDT: Award Ceremony/Banquet



Day 3: Thursday, March 26

8:30 – 10:10 AM EDT

A Framework for Developing and Optimizing Fully Homomorphic Encryption Programs on GPUs
Zhuoran Ji (Shandong University), Jianyu Zhao (Shandong University), Guang Fan (Ant Group), Mingzhe Zhang (Ant Research), Shoumeng Yan (Ant Group), Xueyu Wu (The University of Hong Kong), Lei Ju (Shandong University)

HEPIC: Private Inference over Homomorphic Encryption with Client Intervention
Kevin Nam (Seoul National University), Youyeon Joo (Seoul National University), Seungjin Ha (Seoul National University), Hyungon Moon (Ulsan National Institute of Science and Technology (UNIST)), Yunheung Paek (Seoul National University)

Falcon: Algorithm-Hardware Co-Design for Efficient Fully Homomorphic Encryption Accelerator
Liang Kong (Ant Group), Shengyu Fan (Chinese Academy of Sciences, CAS), Xianglong Deng (Chinese Academy of Sciences, CAS), Lei Chen (Ant Group), Guang Fan (Ant Group), Yilan Zhu (Ant Group), Geng Yang (Ant Group), Yisong Chang (Ant Group), Shoumeng Yan (Ant Group), Mingzhe Zhang (Ant Group)

Maverick: Rethinking TFHE Bootstrapping on GPUs via Algorithm-Hardware Co-Design
Zhiwei Wang (State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS), Haoqi He (State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS), Lutan Zhao (State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS), Qingyun Niu (State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS), Dan Meng (Institute of Information Engineering, CAS), Rui Hou (State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS)
COGENT: Adaptable Compiler Toolchain for Tagging RISC-V Binaries
David B Demicco (Binghamton University), Matthew Cole (Binghamton University), Gokturk Yuksek (Binghamton University), RaviTheja Gollapudi (Binghamton University), Aravind Prakash (Binghamton University), Kanad Ghose (Binghamton University), Zerksis D Umrigar (Binghamton University)

Finding Reusable Instructions via E-Graph Anti-Unification
Youwei Xiao (Peking University), Chenyun Yin (Peking University), Yitian Sun (Peking University), Yuyang Zou (Peking University), Yun Liang (Peking University)

LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language Models
Yijie Zhi (Zhejiang University), Yayu Cao (Zhejiang University), Jianhua Dai (Zhejiang Institute of Administration), Xiaoyang Han (Zhejiang University), Jingwen Pu (Zhejiang University), Qingran Wu (zhejiang university), Sheng Cheng (Beijing ShenZhou Aerospace Software Technology Ltd), Ming Cai (zhejiang university)

Evaluating Compiler Optimization Impacts on zkVM Performance
Thomas Gassmann (ETH Zurich), Stefanos Chaliasos (Imperial College London & zkSecurity), Thodoris Sotiropoulos (ETH Zurich), Zhendong Su (ETH Zurich)
DejaVuzz: Disclosing Transient Execution Bugs with Dynamic Swappable Memory and Differential Information Flow Tracking assisted Processor Fuzzing
Jinyan Xu (Zhejiang University), Yangye Zhou (Zhejiang University), Xingzhi Zhang (Zhejiang University), Yinshuai Li (Southern University of Science and Technology), Qinhan Tan (Princeton University), Yinqian Zhang (Southern University of Science and Technology (SUSTech)), Yajin Zhou (Zhejiang University & BlockSec), Rui Chang (Zhejiang University), Wenbo Shen (Zhejiang University)

Signal Breaker: Fuzzing Digital Signal Processors
Cameron Garcia (Virginia Tech), Matthew Hicks (Virginia Tech)

Scaling Automated Database System Testing
Suyang Zhong (National University of Singapore), Manuel Rigger (National University of Singapore)

SEVI: Silent Data Corruption of Vector Instructions in Hyper-Scale Datacenters
Yixuan Mei (Carnegie Mellon University), Shreya Varshini (Meta Platforms, Inc.), Harish Dattatraya Dixit (Meta Platforms, Inc.), Sriram Sankar (Meta Platforms, Inc.), Rashmi Vinayak (Carnegie Mellon University)
Co-Exploration of RISC-V Processor Microarchitectures and FreeRTOS Extensions for Lower Context Switch Latency
Markus Scheck (Technische Universität Darmstadt), Tammo Mürmann (Technische Universität Darmstadt), Andreas Koch (Technische Universität Darmstadt)

Chips Need DIP: Time-Proportional Per-Instruction Cycle Stacks at Dispatch
Silvio Campelo de Santana (Norwegian University of Science and Technology (NTNU)), Joseph Rogers (Norwegian University of Science and Technology (NTNU)), Lieven Eeckhout (Ghent University), Magnus Jahre (Norwegian University of Science and Technology (NTNU))

Arm Weak Memory Consistency on Apple Silicon: What Is It Good For?
Yossi Khayet (Tel Aviv University), Adam Morrison (Tel Aviv University)

A Data-Driven Dynamic Execution Orchestration Architecture
Zhenyu Bai (National University of Singapore), Pranav Dangi (National University Of Singapore), Rohan Juneja (National University of Singapore), Zhaoying Li (National University of Singapore), Zhanglu Yan (National university of Singapore), Huiying Lan (Lumai Ltd.), Tulika Mitra (National University of Singapore)

10:10 – 10:30 AM EDT: Coffee Break

10:30 – 11:30 AM EDT: Keynote 3 by Hillery Hunter (IBM)

Abstract TBA

12:00 – 1:30 PM EDT: Lunch

1:30 – 3:10 PM EDT

cuJSON: A Highly Parallel JSON Parser for GPUs
Ashkan Vedadi Gargary (University of California, Riverside), Soroosh Safari Loaliyan (University of California, Riverside), Zhijia Zhao (University of California, Riverside)

CHERI-SIMT: Implementing Capability Memory Protection in GPGPUs
Matthew Naylor (University of Cambridge), Alexandre Joannou (University of Cambridge), A. Theodore Markettos (University of Cambridge), Paul Metzger (University of Cambridge), Simon Moore (University of Cambridge), Timothy M. Jones (University of Cambridge)

Lobster: A GPU-Accelerated Framework for Neurosymbolic Programming
Paul Biberstein (University of Pennsylvania), Ziyang Li (University of Pennsylvania), Joseph Devietti (University of Pennsylvania), Mayur Naik (University of Pennsylvania)

ReliaFHE: Resilient Design for Fully Homomorphic Encryption Accelerators
Fan Li (University of Central Florida), Mayank Kumar (University of Central Florida), Ruizhi Zhu (University of Central Florida), Mengxin Zheng (University of Central Florida), Qian Lou (University of Central Florida), Xin Xin (University of Central Florida)
Lambda-trim: Reducing Monetary and Performance Cost of Serverless Cold Starts with Cost-driven Application Debloating
Spyros Pavlatos (University of Pennsylvania), Xuting Liu (University of Pennsylvania), Yuhao Liu (University of Pennsylvania), Vincent Liu (University of Pennsylvania)

Skyler: Static Analysis for Predicting API-Driven Costs in Serverless Applications
Bernardo Ribeiro (INESC-ID, Instituto Superior Técnico, University of Lisbon), Mafalda Ferreira (INESC-ID, Instituto Superior Técnico, University of Lisbon), José Fragoso Santos (INESC-ID, Instituto Superior Técnico, University of Lisbon), Rodrigo Bruno (INESC-ID, Instituto Superior Técnico, University of Lisbon), Nuno Santos (INESC-ID, Instituto Superior Técnico, University of Lisbon)

Enabling fast networking in the public cloud
Alireza Sanaee (University of Cambridge & Huawei), Vahab Jabrayilov (Columbia University), Ilias Marinos (NVIDIA), Farbod Shahinfar (Politecnico di Milano), Divyanshu Saxena (The University of Texas at Austin), Kostis Kaffes (Columbia University), Gianni Antichi (Politecnico di Milano & Queen Mary University of London)

SG-IOV: Socket-Granular I/O Virtualization for SmartNIC-Based Container Networks
Chenxingyu Zhao (University of Washington), Hongtao Zhang (University of Washington), Jaehong Min (University of Washington), Shengkai Lin (Shanghai Jiao Tong University), Wei Zhang (University of Connecticut), Kaiyuan Zhang (University of Washington), Ming Liu (University of Wisconsin–Madison), Arvind Krishnamurthy (University of Washington)
PACT: A Criticality-First Design for Tiered Memory
Hamid Hadian (Virginia Tech), Jinshu Liu (Virginia Tech), Hanchen Xu (Virginia Tech), Hansen Idden (Virginia Tech), Huaicheng Li (Virginia Tech)

CounterPoint: Using Hardware Event Counters to Refute and Refine Microarchitectural Assumptions
Nick Lindsay (Yale University), Caroline Trippel (Stanford University), Anurag Khandelwal (Yale University), Abhishek Bhattacharjee (Yale University)

Performance Predictability in Heterogeneous Memory
Jinshu Liu (Virginia Tech), Hanchen Xu (Virginia Tech), Daniel S. Berger (Microsoft and University of Washington and CMU), Marcos K. Aguilera (NVIDIA), Huaicheng Li (Virginia Tech)

PF-LLM: Large Language Model Hinted Hardware Prefetching
Ceyu Xu (The Hong Kong University of Science and Technology), Xiangfeng Sun (The Hong Kong University of Science and Technology), Weihang Li (Duke University), Chen Bai (The Hong Kong University of Science and Technology), Bangyan Wang (The Hong Kong University of Science and Technology), Mengming Li (The Hong Kong University of Science and Technology), Zhiyao Xie (The Hong Kong University of Science and Technology), Yuan Xie (The Hong Kong University of Science and Technology)
Neura: A Unified Framework for Hierarchical and Adaptive CGRAs
Cheng Tan (Google/ASU), Miaomiao Jiang (Shandong University), Yuqi Sun, Ruihong Yin (University of Minnesota Twin Cities), Yanghui Ou (Cornell University), Qing Zhong (Google), Lei Ju (Shandong University), Jeff Zhang (Arizona State University)

Transforming Torus Fabrics for Efficient Multi-tenant ML
Abhishek Vijaya Kumar (Cornell University), Eric Ding (Cornell University), Arjun Devraj (Cornell University), Darius Bunandar (Lightmatter), Rachee Singh (Cornell)

The Configuration Wall: Characterization and Elimination of Accelerator Configuration Overhead
Josse Van Delm (KU Leuven), Anton Lydike (The University of Edinburgh), Joren Dumoulin (KU Leuven), Jonas Crols (KU Leuven), Xiaoling Yi (KU Leuven), Ryan Antonio (KU Leuven), Jackson Woodruff (The University of Edinburgh), Tobias Grosser (University of Cambridge), Marian Verhelst (KU Leuven)

Static Analysis for Efficient Streaming Tokenization
Angela W. Li (Rice University), Yudi Yang (Rice University), Konstantinos Mamouras (Rice University)

3:10 – 3:30 PM EDT: Coffee Break

3:30 – 5:10 PM EDT

Arancini: A Hybrid Binary Translator for Weak Memory Model Architectures
Sebastian Reimers (Technical University of Munich (TUM)), Dennis Sprokholt (TU Delft), Martin Fink (TU Munich), Theofilos Augoustis (Technical University of Munich (TUM)), Simon Kammermeier (Technical University of Munich), Rodrigo C. O. Rocha (Huawei), Tom Spink (University of St Andrews, United Kingdom), Redha Gouicem (RWTH Aachen University), Soham Chakraborty (TU Delft), Pramod Bhatotia (TU Munich)

Wax: Optimizing Data Center Applications With Stale Profile
Tawhid Bhuiyan (Columbia University in the City of New York), Sumya Hoque (Columbia University in the City of New York), Angelica Moreira (Microsoft Research), Tanvir Ahmed Khan (Columbia University in the City of New York)

M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization
Weiming Hu (Shanghai Jiao Tong University), Zihan Zhang (Shanghai Jiao Tong University), Haoyan Zhang (Shanghai Jiao Tong University), Chen Zhang (Shanghai Jiao Tong University), Cong Guo (Shanghai Jiao Tong University), Yu Feng (Shanghai Jiao Tong University), Tianchi Hu (Computing Product Line, Huawei), Guanglin Li (Computing Product Line, Huawei), Guipeng Hu (Computing Product Line, Huawei), Junsong Wang (Computing Product Line, Huawei), Jingwen Leng (Shanghai Jiao Tong University)

Gouda: A Swift Fully Homomorphic Encryption Library Designed for GPU Architectures
Wonseok Choi (Seoul National University), Jongmin Kim (Seoul National University), Jung Ho Ahn (Seoul National University)
COMPAS: A Distributed Multi-Party SWAP Test for Parallel Quantum Algorithms
Brayden Goldstein-Gelb (Brown University), Kun Liu (Yale University), John M. Martyn (Massachusetts Institute of Technology), Hengyun (Harry) Zhou (QuEra Computing Inc.), Yongshan Ding (Yale University), Yuan Liu (North Carolina State University)

TreeVQA: A Tree-Structured Execution Framework for Shot Reduction in Variational Quantum Algorithms
Yuewen Hou (University of Michigan), Dhanvi Bharadwaj (University of Michigan), Gokul Ravi (University of Michigan)

CHEHAB RL: Learning to Optimize Fully Homomorphic Encryption Computations
Bilel Sefsaf (New York University Abu Dhabi and Ecole Superieure d'Informatique), Abderraouf Dandani (New York University Abu Dhabi and Ecole Superieure d'Informatique), Abdessamed Seddiki (New York University Abu Dhabi and Ecole Superieure d'Informatique), Arab Mohammed (New York University Abu Dhabi and Ecole Superieure d'Informatique), Eduardo Chielle (New York University Abu Dhabi), Mihalis Maniatakos (New York University Abu Dhabi), Riyadh Baghdadi (New York University Abu Dhabi)

CEMU: Enabling Full-System Emulation of Computational Storage beyond Hardware Limits
Qiuyang Zhang (Huazhong University of Science and Technology), Jiapin Wang (Huazhong University of Science and Technology), You Zhou (Huazhong University of Science and Technology), Peng Xu (Research Center for High Efficiency Computing Infrastructure, Zhejiang Lab), Kai Lu (Huazhong University of Science and Technology), Jiguang Wan (Huazhong University of Science and Technology), Fei Wu (Huazhong University of Science and Technology), Tao Lu (DapuStor)
Fault Escaping: Improving Robustness of DPU Enhanced Platform with Mutual Assisted VM Recovery
Chao Zhang (Alibaba Group), Tao Xu (Alibaba Group), Junming Liu (Alibaba Group), Pai Liu (Alibaba Group), Zhilang Xu (Alibaba Group), Kang Yan (Alibaba Group), Shuo Shi (Shanghai Jiao Tong University), Jintao Li (Alibaba Group), Jinhu Li (Alibaba Group), Chen Liang (Alibaba Cloud), Wenhui kevin (Intel Corporation), FeiFei Fan (Intel Corporation), Yibin Shen (Alibaba Cloud), Hang Yang (Alibaba Cloud), Jianming Song (Alibaba Group), Xudong Zheng (Alibaba Group), Jiesheng Wu (Alibaba Group), Jian Li (Shanghai Jiao Tong University)

Shields Up! Software Radiation Protection for Commodity Hardware in Space
Haoda Wang (Columbia University), Yonatan Winetraub (Aptos Orbital), Steven Myint (Jet Propulsion Laboratory, California Institute of Technology), Vandi Verma (Jet Propulsion Laboratory, California Institute of Technology), Junfeng Yang (Columbia University), Asaf Cidon (Columbia University)

TierX: A Simulation Framework for Multi-tier BCI System Design Evaluation and Exploration
Seunghyun Song (Seoul National University), Yeongwoo Jang (Seoul National University), Daye Jung (Seoul National University), Kyungsoo Park (Seoul National University), Donghan Kim (Seoul National University), Gwangjin Kim (Seoul National University), Hunjun Lee (Hanyang University), Jerald Yoo (Seoul National University), Jangwoo Kim (Seoul National University / MangoBoost)

PrioriFI: More Informed Fault Injection for Edge Neural Networks
Olivia Weng (UC San Diego), Andres Meza (UC San Diego), Nhan Tran (Fermilab), Ryan Kastner (UC San Diego)
TiNA: Tiered Network Buffer Architecture for Fast Networking in Chiplet-based CPU
Siddharth Agarwal (UIUC), Tianchen Wang (UIUC), Jinghan Huang (UIUC), Saksham Agarwal (UIUC), Nan Sung Kim (UIUC)

An MLIR Lowering Pipeline for Stencils at Wafer-Scale
Nicolai Stawinoga (Technische Universität Berlin), David Katz (University of Edinburgh), Anton Lydike (University of Edinburgh), Justs Zarins (University of Edinburgh), Nick Brown (University of Edinburgh), George Bisbas (Imperial College London), Tobias Grosser (University of Cambridge)

JOSer: Just-In-Time Object Serialization for Heavy Java Serialization Workloads
Chaokun Yang (Ant Group), Pengbo Nie (Shanghai Jiao Tong University), Ziyi Lin (Alibaba Group), Weipeng Wang (Ant Group), Qianwei Yu (Shanghai Jiao Tong University), Chengcheng Wan (East China Normal University), He Jiang (Dalian University of Technology), Yuting Chen (Shanghai Jiao Tong University)

Wave: Offloading Resource Management to SmartNIC Cores
Jack Tigar Humphries (Stellar Development Foundation), Neel Natu (Google, Inc.), Kostis Kaffes (Columbia University), Stanko Novaković (Google, Inc.), Paul Turner (Google, Inc.), Henry M. Levy (Google, Inc. and University of Washington), David Culler (Google, Inc. and University of California, Berkeley), Christos Kozyrakis (Stanford University)

5:15 – 5:30 PM EDT: Closing Remarks



The program page was generated and formatted using Professor Saugata Ghose‘s (UIUC) Conference Program Generator.