22nd Int. Conf on Scientific and Statistical Database Management (SSDBM'10), Heidelberg, Germany, June 30 - July 2, 2010.
Julio López1, Leonardo Ramírez-Guzmán2, Jacobo Bielak2, and David O’Hallaron1
1 Parallel Data Laboratory, Carnegie Mellon University
2 Computational Seismology Laboratory, Carnegie Mellon University
State-of-the-art numerical solvers in Earth Sciences produce multi terabyte datasets per execution. Operating on increasingly larger datasets becomes challenging due to insufficient data bandwidth. Queries result in difficult to handle I/O access patterns. BEMC is a new mechanism that allows querying and processing wavefields in the compressed representation.
This approach combines well-known spatial-indexing techniques with novel compressed representations, thus reducing I/O bandwidth requirements. A new compression
approach based on boundary integral representations exploits properties
of the simulated domain. Frequency domain representation further compresses
the data by eliminating temporal redundancy found in wave propagation data.
This representation enables the transformation of a large I/O workload into a
massively-parallel CPU-intensive computation. Queries to this representation result
in largely sequential I/O accesses. Although, decompression places heavy demands
on the CPU, it exhibits parallelism well-suited for many-core processors.
We evaluate our approach in the context of data analysis for the Earth Sciences
datasets.
FULL PAPER: pdf