Parallel Data Laboratory

PDL People

Mor Harchol-Balter

Contact:	www \|
Office: Phone: Fax: Admin:	GHC 7207 (412) 268-7893 (412) 268-5576 Charlotte Yano - (412) 268-7656
Mailing Address:	Computer Science Department School of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213-3891
Position:	Bruce J. Nelson Professor of Computer Science
Projects:	Distributed computing, performance analysis, scheduling and resource allocation, workload characterization.

Research Interests:

I am interested in the performance analysis and design of computer systems, particularly distributed systems. I work on finding analytical models which capture the important characteristics of a computer system and allow me to redesign the system to improve its performance.

I believe that many conventional wisdoms on which we base system designs are not well understood and sometimes false, leading to inferior designs. Many of our existing beliefs stem from queueing research in the 60's and 70's -- the great era for system performance modeling. Unfortunately, back then we did not have all the analytical and computational tools available today. Consequently, some questions were answered only under the approximation of Markovian workloads (exponentially distributed job sizes) which we now know to be non-representative of many real-world workloads (which show much greater variability and often heavy tails). Also, many multi-server system models and scheduling/routing schemes were not analytically tractable at that time.

My research revisits these very classic questions in system design, armed with today's new queueing and computational techniques, as well as a new perspective on real-world workloads, performance metrics, and implementation experience. I work on deriving new fundamental theorems in system design, many of which seem couterintuitive in light of age-old beliefs. I then incorporate these theorems into implementations of Web servers, database servers, and distributed server systems.

Here are a few examples of commonly-held beliefs that my research challenges:

Thousands of "load balancing" heuristics do exactly that -- they aim to balance the load among the existing hosts. But is load balancing necessarily a good thing?
Ever notice that the optimal scheduling policy, Shortest-Remaining-Processing-Time-First (SRPT), is rarely used in practice? There's a fear that the big jobs will "starve", or be treated unfiarly as compared with Processor-Sharing (PS). But is this a reality?
To minimize mean waiting time in a server farm, research suggests that each job should be sent to the server at which it will experience the least wait. That seems good from the job's perspective, but is the greedy strategy best for the system as a whole?
Given a choice between a single machine with speed s, or n identical machines each with speed s/n, which would you choose? Think again ...
Migrating active jobs is generally considered too expensive. Killing jobs midway through execution and restarting them from scratch later is even worse! Says who?
Cycle stealing (using another machine's idle cycles to do your work) is a central theme in distributed systems and has been the topic of thousands of papers. But can we quantify when cycle stealing pays off, as a function of switching costs and threshold parameters?