PARALLEL DATA LAB 

PDL Abstract

BEMC: A Searchable, Compressed Representation for Large Seismic Wavefields

22nd Int. Conf on Scientific and Statistical Database Management (SSDBM'10), Heidelberg, Germany, June 30 - July 2, 2010.

Julio López1, Leonardo Ramírez-Guzmán2, Jacobo Bielak2, and David O’Hallaron1

1 Parallel Data Laboratory, Carnegie Mellon University
2 Computational Seismology Laboratory, Carnegie Mellon University

http://www.pdl.cmu.edu/

State-of-the-art numerical solvers in Earth Sciences produce multi terabyte datasets per execution. Operating on increasingly larger datasets becomes challenging due to insufficient data bandwidth. Queries result in difficult to handle I/O access patterns. BEMC is a new mechanism that allows querying and processing wavefields in the compressed representation.

This approach combines well-known spatial-indexing techniques with novel compressed representations, thus reducing I/O bandwidth requirements. A new compression approach based on boundary integral representations exploits properties of the simulated domain. Frequency domain representation further compresses the data by eliminating temporal redundancy found in wave propagation data. This representation enables the transformation of a large I/O workload into a massively-parallel CPU-intensive computation. Queries to this representation result in largely sequential I/O accesses. Although, decompression places heavy demands on the CPU, it exhibits parallelism well-suited for many-core processors. We evaluate our approach in the context of data analysis for the Earth Sciences datasets.

FULL PAPER: pdf