Mitigating Application Resource Overload with Targeted Task Cancellation Yigong Hu, Zeyin Zhang, Yicheng Liu, Yile Gu , Shuangyu Lei, Baris Kasikci, Peng Huang. (To Appear) SOSP 2025, Seoul, Republic of Korea, November 2025.
Scalable and Accurate Application-level Crash-Consistency Testing via Representative Testing Yile Gu*, Ian Neal*, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geffen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci. (To Appear) OOPSLA 2025, Singapore, October 2025. https://arxiv.org/abs/2503.01390
NanoFlow: Towards Optimal Large Language Model Serving Throughput Kan Zhu, Yufei Gao, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu , Dedong Xie, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Ziren Wang, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci. OSDI 2025, Boston, MA, USA, July 2025. https://arxiv.org/abs/2408.12757
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Keisuke Kamahori*, Tian Tang*, Yile Gu, Kan Zhu, Baris Kasikci. ICLR 2025, Singapore, May 2025. https://arxiv.org/abs/2402.07033
Perseus: Removing Energy Bloat from Large Model Training Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury. SOSP 2024, Austin, TX, USA, November 2024. https://arxiv.org/abs/2312.06902
Preprints
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices Yile Gu*, Rohan Kadekodi*, Hoang Nguyen, Keisuke Kamahori, Yiyu Liu, Baris Kasikci. https://arxiv.org/abs/2506.17538
Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models Yile Gu, Yifan Xiong, Jonathan Mace, Yuting Jiang, Yigong Hu, Baris Kasikci, Peng Cheng. https://arxiv.org/abs/2501.14170
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs Kan Zhu*, Tian Tang*, Qinyu Xu*, Yile Gu, Zhichen Zeng, Rohan Kadekodi, Liangyu Zhao, Ang Li, Arvind Krishnamurthy, Baris Kasikci. https://arxiv.org/abs/2502.12216
Semantic Scheduling for LLM Inference Wenyue Hua*, Dujian Ding*, Yile Gu , Yujie Ren, Kai Mei, Minghua Ma, William Yang Wang. https://arxiv.org/abs/2506.12204
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval Chien-Yu Lin*, Keisuke Kamahori*, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Stephanie Wang, Arvind Krishnamurthy, Rohan Kadekodi, Luis Ceze, Baris Kasikci. https://arxiv.org/abs/2502.20969