Ekka: Automated Diagnosis of Silent Errors in LLM Inference. Yile Gu, Zhen Zhang, Shaowei Zhu, Xinwei Fu, Jun Wu, Yida Wang, Baris Kasikci. (To Appear) ICML 2026, Seoul, South Korea, July 2026.
DynaFlow: Transparent and Flexible Intra-Device Parallelism via Programmable Operator Scheduling. Yi Pan, Yile Gu, Jinbin Luo, Yibo Wu, Ziren Wang, Hongtao Zhang, Ziyi Xu, Shengkai Lin, Baris Kasikci, Stephanie Wang. MLSys 2026, Bellevue, WA, USA, May 2026. https://mlsys.org/virtual/2026/poster/3548
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval. Chien-Yu Lin*, Keisuke Kamahori*, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Rohan Kadekodi, Stephanie Wang, Arvind Krishnamurthy, Rohan Kadekodi, Luis Ceze, Baris Kasikci. MLSys 2026, Bellevue, WA, USA, May 2026. https://mlsys.org/virtual/2026/poster/3573
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs Kan Zhu*, Tian Tang*, Qinyu Xu*, Yile Gu, Zhichen Zeng, Rohan Kadekodi, Liangyu Zhao, Ang Li, Arvind Krishnamurthy, Baris Kasikci. ICLR 2026, Rio de Janeiro, Brazil, April 2026. https://arxiv.org/abs/2502.12216
Mitigating Application Resource Overload with Targeted Task Cancellation Yigong Hu, Zeyin Zhang, Yicheng Liu, Yile Gu , Shuangyu Lei, Baris Kasikci, Peng Huang. SOSP 2025, Seoul, Republic of Korea, November 2025. https://dl.acm.org/doi/10.1145/3731569.3764835
Scalable and Accurate Application-level Crash-Consistency Testing via Representative Testing Yile Gu*, Ian Neal*, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geffen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci. OOPSLA 2025, Singapore, October 2025. https://arxiv.org/abs/2503.01390
NanoFlow: Towards Optimal Large Language Model Serving Throughput Kan Zhu, Yufei Gao, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu , Dedong Xie, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Ziren Wang, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci. OSDI 2025, Boston, MA, USA, July 2025. https://arxiv.org/abs/2408.12757
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Keisuke Kamahori*, Tian Tang*, Yile Gu, Kan Zhu, Baris Kasikci. ICLR 2025, Singapore, May 2025. https://arxiv.org/abs/2402.07033
Perseus: Removing Energy Bloat from Large Model Training Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury. SOSP 2024, Austin, TX, USA, November 2024. https://arxiv.org/abs/2312.06902
Preprints
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices Yile Gu*, Rohan Kadekodi*, Hoang Nguyen, Keisuke Kamahori, Yiyu Liu, Baris Kasikci. https://arxiv.org/abs/2506.17538
Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models Yile Gu, Yifan Xiong, Jonathan Mace, Yuting Jiang, Yigong Hu, Baris Kasikci, Peng Cheng. https://arxiv.org/abs/2501.14170
The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution. Frank Sifei Luan*, Ron Yifeng Wang*, Yile Gu, Ziming Mao, Charlotte Lin, Amog Kamsetty, Hao Chen, Cheng Su, Balaji Veeramani, Scott Lee, SangBin Cho, Clark Zinzow, Eric Liang, Ion Stoica, Stephanie Wang. https://arxiv.org/abs/2501.12407
Semantic Scheduling for LLM Inference Wenyue Hua*, Dujian Ding*, Yile Gu, Yujie Ren, Kai Mei, Minghua Ma, William Yang Wang. https://arxiv.org/abs/2506.12204
AgentFlux: Decoupled Fine-Tuning \& Inference for On-Device Agentic Systems. Rohan Kadekodi*, Zhan Jin*, Keisuke Kamahori, Yile Gu, Sean Khatiri, Noah H Bayindirli, Sergey Gorbunov, Baris Kasikci. https://arxiv.org/abs/2510.00229