PARALLEL DATA LAB 

PDL Abstract

Towards Understanding Heterogeneous Clouds at Scale: Google Trace Analysis

Intel Science and Technology Center for Cloud Computing Technical Report ISTC-CC-TR-12-101, April 27, 2012.

Charles Reiss*, Alexey Tumanov†, Gregory R. Ganger†, Randy H. Katz*, Michael A. Kozuch^

* UC Berkeley
† ECE, Carnegie Mellon University
^ Intel Labs

http://www.pdl.cmu.edu/
http://www.istc-cc.cmu.edu/

With the emergence of large, heterogeneous, shared computing clusters, their efficient use by mixed distributed workloads and tenants remains an important challenge. Unfortunately, little data has been available about such workloads and clusters. This paper analyzes a recent Google release of scheduler request and utilization data across a large (12500+) general-purpose compute cluster over 29 days. We characterize cluster resource requests, their distribution, and the actual resource utilization. Unlike previous scheduler traces we are aware of, this one includes diverse workloads – from large web services to large CPU-intensive batch programs – and permits comparison of actual resource utilization with the user-supplied resource estimates available to the cluster resource scheduler. We observe some under-utilization despite over-commitment of resources, difficulty of scheduling high-priority tasks that specify constraints, and lack of dynamic adjustments to user allocation requests despite the apparent availability of this feature in the scheduler.

KEYWORDS: cloud computing, cluster scheduling, trace characterization

FULL TR: pdf