PARALLEL DATA LAB 

PDL Abstract

Is Perfect Hashing Practical for OLAP Systems?

4th Annual Conference on nnovative Data Systems Research (CIDR ’24) January 14-17, 2024, Chaminade, USA.

Kevin P. Gaffney*, Jignesh M. Patel

* University of Wisconsin-Madison
Carnegie Mellon University

http:\\www..pdl.cmu.edu

A perfect hash function (PHF) maps a set of keys to a range of integers with no collisions. Compared to conventional hash methods, PHFs are attractive for their low space overhead and reduced control flow. Despite their advantages, there has been little investigation into the use of PHFs for online analytical processing (OLAP). This paper is an initial guide to practical perfect hashing for OLAP. We identify several promising applications for PHFs in OLAP and survey their current use in systems and research prototypes. We then evaluate existing PHF approaches and quantify their impact on query performance. Our results are encouraging: in a real OLAP system, PHFs achieve end-to-end speedups of 1.7X and 3.1X for join and aggregate queries, respectively. Nevertheless, there is room for improvement. Future approaches that simultaneously achieve low build time and high probe throughput could offer additional performance increases.

FULL PAPER: pdf