Workshop on Hot Topics in Cloud Computing (HotCloud '09), San Diego, CA, on June 15, 2009. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-103, May 2009.
Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan
Parallel Data Laboratory
School of Computer Science & Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
Mochi, a new visual, log-analysis based debugging tool correlates Hadoop’s behavior in space, time and volume, and extracts a causal, unified control- and data-flow model of Hadoop across the nodes of a cluster. Mochi’s analysis produces visualizations of Hadoop’s behavior using which users can reason about and debug performance issues. We provide examples of Mochi’s value in revealing a Hadoop job’s structure, in optimizing real-world workloads, and in identifying anomalous Hadoop behavior, on the Yahoo! M45 Hadoop cluster.
KEYWRORDS: Visualization, Log analysis, Performance Debugging, Hadoop, MapReduce