PARALLEL DATA LAB 

PDL Abstract

Using Data Transformations for Low-latency Time Series Analysis

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-15-106, August, 2015.

Henggang Cui, Kimberly Keeton†, Indrajit Roy†, Krishnamurthy Viswanathan†, Gregory R. Ganger

Carnegie Mellon University
Pittsburgh, PA 15213

† Hewlett-Packard Laboratories

http://www.pdl.cmu.edu/

Time series analysis is commonly used when monitoring data centers, networks, weather, and even human patients. In most cases, the raw time series data is massive, from millions to billions of data points, and yet interactive analyses require low (e.g., sub-second) latency. Aperture transforms raw time series data, during ingest, into compact summarized representations that it can use to eciently answer queries at runtime. Aperture handles a range of complex queries, from correlating hundreds of lengthy time series to predicting anomalies in the data. Aperture achieves much of its high performance by executing queries on data summaries, while providing a bound on the information lost when transforming data. By doing so, Aperture can reduce query latency as well as the data that needs to be stored and analyzed to answer a query. Our experiments on real data show that Aperture can provide one to four orders of magnitude lower query response time, while incurring only 10% ingest time overhead and less than 20% error in accuracy.

FULL PAPER: pdf