PARALLEL DATA LAB 

PDL Abstract

Understanding and Improving the Diagnostic Workflow of MapReduce Users

ACM Symposium on Computer Human Interaction for Management of Information Technology (CHIMIT), Boston, MA, December 2011.

Jason D. Campbell**, Arun B. Ganesan, Ben Gotow, Soila P. Kavulya, James Mulholland, Priya Narasimhan, Sriram Ramasubramanian, Mark Shuster, Jiaqi Tan*

School of Computer Science & Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

*DSO National Laboratories Singapore
**Intel Labs Pittsburgh

http://www.pdl.cmu.edu/

New abstractions are simplifying the programming of large clusters, but diagnosis nonetheless gets more and more challenging as cluster sizes grow: Debugging information increases linearly with cluster size, and the count of inter-component relationships grows quadratically. Worse, the new abstractions which simplified programming can also obscure the relationships between high-level (application) and low-level (task/process/disk/CPU) information flows. In this paper we analyze the workflow of several users and systems administrators connected with a large academic cluster (based the popular Hadoop implementation of the MapReduce abstraction) and propose improvements to the diagnosis- relevant information displays. We also offer a preliminary analysis of the efficacy of the changes we propose that demonstrates a 40% reduction in the time taken to accomplish 5 representative diagnostic tasks as compared to the current system.

FULL TR: pdf