PARALLEL DATA LAB 

PDL Abstract

Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-103, May 2009.

Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan

Parallel Data Laboratory
School of Computer Science & Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Mochi, a new visual, log-analysis based debugging tool correlates Hadoop’s behavior in space, time and volume, and extracts a causal, unified control- and data-flow model of Hadoop across the nodes of a cluster. Mochi’s analysis produces visualizations of Hadoop’s behavior using which users can reason about and debug performance issues. We provide examples of Mochi’s value in revealing a Hadoop job’s structure, in optimizing real-world workloads, and in identifying anomalous Hadoop behavior, on the Yahoo! M45 Hadoop cluster.

KEYWRORDS: Visualization, Log analysis, Performance Debugging, Hadoop, MapReduce

FULL TR: pdf