SC18, November 11-16, 2018, Dallas, Texas, USA.
Qing Zheng†, Charles D. Cranor†, Danhao Guo†, Gregory R. Ganger†, George Amvrosiadis†, Garth A. Gibson† Bradley W. Settlemyer‡, Gary Grider‡, Fan Guo‡
†Carnegie Mellon University
‡Los Alamos National Laboratory
Analysis of large-scale simulation output is a core element of scientific inquiry, but analysis queries may experience significant I/O overhead when the data is not structured for efficient retrieval. While in-situ processing allows for improved time-to-insight for many applications, scaling in-situ frameworks to hundreds of thousands of cores can be difficult in practice. The DeltaFS in-situ indexing is a new approach for in-situ processing of massive amounts of data to achieve efficient point and small-range queries. This paper describes the challenges and lessons learned when scaling this in-situ processing function to hundreds of thousands of cores. We propose techniques for scalable all-to-all communication that is memory and bandwidth efficient, concurrent indexing, and specialized LSM-Tree formats. Combining these techniques allows DeltaFS to control the cost of in-situ processing while maintaining 3 orders of magnitude query speedup when scaling alongside the popular VPIC particle-in-cell code to 131,072 cores.
FULL PAPER: pdf