PARALLEL DATA LAB 

PDL Abstract

Scaling Embedded In-Situ Indexing with DeltaFS

SC18, November 11-16, 2018, Dallas, Texas, USA.

Qing Zheng†, Charles D. Cranor†, Danhao Guo†, Gregory R. Ganger†, George Amvrosiadis†, Garth A. Gibson† Bradley W. Settlemyer‡, Gary Grider‡, Fan Guo‡

†Carnegie Mellon University
‡Los Alamos National Laboratory

http://www.pdl.cmu.edu/

Analysis of large-scale simulation output is a core element of scientific inquiry, but analysis queries may experience significant I/O overhead when the data is not structured for efficient retrieval. While in-situ processing allows for improved time-to-insight for many applications, scaling in-situ frameworks to hundreds of thousands of cores can be difficult in practice. The DeltaFS in-situ indexing is a new approach for in-situ processing of massive amounts of data to achieve efficient point and small-range queries. This paper describes the challenges and lessons learned when scaling this in-situ processing function to hundreds of thousands of cores. We propose techniques for scalable all-to-all communication that is memory and bandwidth efficient, concurrent indexing, and specialized LSM-Tree formats. Combining these techniques allows DeltaFS to control the cost of in-situ processing while maintaining 3 orders of magnitude query speedup when scaling alongside the popular VPIC particle-in-cell code to 131,072 cores.

FULL PAPER: pdf