PARALLEL DATA LAB

DISC-Index: Indexing Cosmological Simulations

Problem

When cosmologists study the properties and history of the universe, they use computer models as one of their main tools. For instance, they may simulate formation of galaxies, development of black holes, or evolution of the universe as a whole. They compare simulation results with observations of the real universe, which helps to evaluate and improve cosmological theories.

The modern supercomputers enable researchers to run simulations with billions of objects, and to store the locations and properties of objects at different points of the simulated time line, thus producing datasets with tens or even hundreds of billions of records. Cosmologists need to query these datasets to get both aggregate statistics and details of specific objects. For instance, they may need to see how the number of black holes in a simulation changes over time, identify all black holes with certain properties, or track the history of a specific black hole. To support such queries, we are building tools for indexing the results of cosmological simulations.

We are building data structures for indexing objects in cosmological models
and answering queries about model properties.


RESULTS

We have developed an initial system for indexing simulated objects by three-dimensional Cartesian coordinates. We use an octree to partition and index the simulated universe; linearize the octree using Morton codes; and store it in HBase, which is an off-the-shelf distributed database. We have used this indexing to support general-purpose spatial queries; perform load-balancing in a distributed procedure for identifying galaxy clusters; and implement an approximate algorithm for computing correlation functions and the fractal dimension of the universe.

More details: Summary of the data structures
 

CHALLENGES

We are investigating other indexing techniques, which will allow arranging objects not only by their locations but also by object properties, thus supporting a wider range of queries. We are also working with cosmologists on evaluating the applicability of the developed techniques to various simulations. A longer-term goal is to integrate these techniques with other cosmological applications and build a general-purpose distributed architecture for analysis of astronomical and cosmological data. Another research direction is to apply these techniques to other areas of eScience, such as seismology, bioinformatics, and web mining.

PEOPLE

FACULTY
Eugene Fink
Garth Gibson
Julio López

GRADUATE STUDENTS
Bin Fu

EXTERNAL COLLABORATORS
Tiziana Di Matteo (Physics, Carnegie Mellon University)
Rupert Croft (Physics, Carnegie Mellon University)