Conference on Innovative Data Systems Research (CIDR 2024), Chaminade, CA, USA, January 14-17, 2024.
Kai Franz, Samuel Arch, Denis Hirn*, Torsten Grust*, Todd C. Mowry, Andrew Pavlo
Carnegie Mellon University
* University of Tübingen
SQL’s user-defined functions (UDFs) allow developers to express complex computation using procedural logic. But UDFs have been the bane of database management systems (DBMSs) for decades because they inhibit optimization opportunities, potentially slowing down queries significantly. In response, batching and inlining techniques have been proposed to enable effective query optimization of UDF calls within SQL. Inlining is now available in a major commercial DBMS. But the trade-offs between both approaches on modern DBMSs remain unclear.
We evaluate and compare UDF batching and inlining on enterprise and open-source DBMSs using a state-of-the-art UDF-centric workload. We observe the surprising result that although inlining is better on simple UDFs, batching outperforms inlining by up to 93.4× for more complex UDFs because it makes it easier for a DBMS’s query optimizer to decorrelate subqueries. We propose a hybrid approach that chooses batching or inlining to achieve the best performance.
FULL PAPER: pdf