PARALLEL DATA LAB 

PDL Abstract

ThermoCast: A Cyber-Physical Forecasting Model for Data Centers

KDD '11: Proceeding of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 21-24, San Diego, CA.

Lei Li, Chieh-Jan Mike Liang*, Jie Liu*, Suman Nath*, Andreas Terzis**, Christos Faloutsos

School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

*Microsoft Research
**Johns Hopkins University

http://www.pdl.cmu.edu/

Efficient thermal management is important in modern data centers as cooling consumes up to 50% of the total energy. Unlike previous work, we consider proactive thermal management, whereby servers can predict potential overheating events due to dynamics in data center configuration and workload, giving operators enough time to react. However, such forecasting is very challenging due to data center scales and complexity. Moreover, such a physical system is influenced by cyber effects, including workload scheduling in servers. We propose ThermoCast, a novel thermal forecasting model to predict the temperatures surrounding the servers in a data center, based on continuous streams of temperature and airflow measurements. Our approach is (a) capable of capturing cyberphysical interactions and automatically learning them from data; (b) computationally and physically scalable to data center scales; (c) able to provide online prediction with real-time sensor measurements.

The paper's main contributions are: (i) We provide a systematic approach to integrate physical laws and sensor observations in a data center; (ii) We provide an algorithm that uses sensor data to learn the parameters of a data center's cyber-physical system. In turn, this ability enables us to reduce model complexity compared to full-fledged fluid dynamics models, while maintaining forecast accuracy; (iii) Unlike previous simulation-based studies, we perform experiments in a production data center. Using real data traces, we show that ThermoCast forecasts temperature 2 better than a machine learning approach solely driven by data, and can successfully predict thermal alarms 4.2 minutes ahead of time.

FULL PAPER: pdf