PARALLEL DATA LAB 

PDL Abstract

PIM-trie: A Skew-resistant Trie for Processing-in-Memory

SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, June 2023, Orlando, FL.

Hongbo Kang*, Yiwei Zhao, Guy E. Blelloch, Laxman Dhulipala^, Yan Gu†, Charles McGuffey‡, Phillip B. Gibbons

Carnegie Mellon University
*Tsinghua University, China
^University of Maryland
†UC Riverside
‡Reed College

http://www.pdl.cmu.edu/

Memory latency and bandwidth are significant bottlenecks in designing in-memory indexes. Processing-in-memory (PIM), an emerging hardware design approach, alleviates this problem by embedding processors in memory modules, enabling low-latency memory access whose aggregated bandwidth scales linearly with the number of PIM modules. Despite recent work in balanced comparison-based indexes on PIM systems, building efficient tries for PIMs remains an open challenge due to tries' inherently unbalanced shape.

This paper presents the PIM-trie, the first batch-parallel radix-based index for PIM systems that provides load balance and low communication under adversary-controlled workloads. We introduce trie matching-matching a query trie of a batch against the compressed data trie-as a key building block for PIM-friendly index operations. Our algorithm combines (i) hash-based comparisons for coarse-grained work distribution/elimination and (ii) bit-by-bit comparisons for fine-grained matching. Combined with other techniques (meta-block decomposition, selective recursive replication, differentiated verification), PIM-trie supports LongestCommonPrefix, Insert, and Delete in O(logP) communication rounds per batch and O(l/w) communication volume per string, where P is the number of PIM modules, l is the string length in bits, and w is the machine word size. Moreover, work and communication are load-balanced among modules whp, even under worst-case skew.

FULL PAPER: pdf