We are working on algorithms for computing correlation functions,
which is a cosmological application for analyzing the distribution of
matter in the universe. We are also addressing the related problem of
determining the fractal dimension of the universe. This
project is part of the work on astronomy applications of
data-intensive super computing (Astro-DISC).
The computation of two-point correlation functions involves
representing galaxies as points in three-dimensional space; counting
the pairs of galaxies within different distance ranges from each
other; and building a histogram that shows the dependency of the
number of pairs on the distance. Cosmologists use such histograms as
one of their tools for analyzing large-scale properties of the
universe and the history of its evolution. A straightforward algorithm
for counting galaxy pairs takes quadratic time, which is impractically
slow for massive astronomical surveys and cosmological simulations. We
are working on distributed approximation algorithms for fast accurate
estimation of correlation functions.
We have developed and evaluated several alternative algorithms, including the exact computation with O(n5/3) average running time; an approximate linear-time computation; and another approximate algorithm with sub-linear time, based on sampling of a dataset and finding correlations functions for the sample. We have also designed a top-level control module, which selects an appropriate combination of exact and approximation algorithms for a given dataset size and accuracy requirements. The resulting system can process trillions of galaxies and compute correlation functions with 1% precision.
More details: Summary of
the algorithms and empirical results
We are working on a distributed version, which will allow control of
the tradeoff among the approximation accuracy, running time, and
number of used cores. We are also developing algorithms for three-point correlation functions, which compute the
distribution of galaxy triples by distances among them. A longer-term
challenge is to integrate the developed techniques with other
astronomical applictions and build a general-purpose architecture for
the analysis of massive astronomical data.
FACULTY
Eugene Fink
Garth Gibson
Julio López
Christos Faloutsos
GRADUATE STUDENTS
Bin Fu
EXTERNAL COLLABORATORS
Tiziana Di Matteo (Physics, Carnegie Mellon University)
Rupert Croft (Physics, Carnegie Mellon University)