PARALLEL DATA LAB 

PDL Abstract

Ganesha: Black-Box Fault Diagnosis for MapReduce Systems

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-08-112, September 2008.

Xinghao Pan, Jiaqi Tan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan

Parallel Data Laboratory
School of Computer Science & Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Ganesha aims to diagnose faults transparently in MapReduce systems, by analyzing OS-level metrics alone. Ganesha’s approach is based on peer-symmetry under fault-free conditions, and can diagnose faults that manifest asymmetrically at nodes within a MapReduce system. While our training is performed on smaller Hadoop clusters and for specific workloads, our approach allows us to diagnose faults in larger Hadoop clusters and for unencountered workloads. We also candidly highlight faults that escape Ganesha’s black-box diagnosis.

KEYWORDS: Hadoop, Fault diagnosis, Clustering

FULL TR: pdf