NEW - The PDL Packet - Fall 2025 Newsletter
AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding
Zikun Li, Zhuofu Chen, Remi Delacourt, Gabriele Oliaro, Zeyu Wang, Qinghan Chen, Shuhuai Lin, April Yang, Zhihao Zhang, Zhuoming Chen, Yi-Hsiang Lai, Xinhao Cheng, Xupeng Miao, Zhihao Jia
EUROSYS ’26, Edinburgh, Scotland UK. April 27-30, 2026.
Modern large language model (LLM) applications exhibit diverse service-level objectives (SLOs), from low-latency requirements in interactive coding assistants to more relaxed constraints in data wrangling tasks. Existing LLM serving systems, which rely on uniform batching and scheduling strategies, often fail to meet these heterogeneous SLOs concurrently. We present AdaServe, the first LLM serving system designed to support efficient multi-SLO serving through SLO-customized speculative decoding.[...more]
PIM-zd-tree: A Fast Space-Partitioning Index Leveraging Processing-in-Memory
Yiwei Zhao, Hongbo Kang, Ziyang Men, Yan Gu, Guy E. Blelloch, Laxman Dhulipala, Charles McGuffey, Phillip B. Gibbons
PPoPP '26: Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. 31 January - 4 February, 2026 Sydney, Australia.
In this paper, we present PIM-zd-tree, the first spacepartitioning index specifically designed for real-world PIM systems. PIM-zd-tree employs a tunable multi-layer structure, with each layer adopting distinct data layouts, partitioning schemes, and caching strategies. Its design is theoretically grounded to achieve load balance, minimal memory-channel communication, and low space overhead. [...more]
Demystifying and Improving Lazy Promotion in Cache Eviction
Qinghan Chen, Muhammad Haekal Muhyidin Al-Araby, Ziyue Qiu, Zhuofan Chen, Rashmi Vinaya, Juncheng Yang
Proceedings of the VLDB Endowment, Vol. 19, No. 4, ISSN 2150-8097, 2026..
Cache eviction algorithms play a critical role in the performance of modern data systems, yet their scalability is often limited by the high computational overhead associated with object promotions. Lazy Promotion techniques have emerged as relaxations of traditional Least-Recently-Used (LRU) methods, designed to alleviate lock contention and increase throughput. This work uses production traces from real-world systems to benchmark five Lazy Promotion strategies: Probabilistic-LRU, Batch-LRU, Delay-LRU, FIFO-reinsertion, and Random-LRU. [...more]