Datacenters increasingly use a heterogeneous collection of machines with different capabilities (e.g., memory capacity, GPU accelerator) to execute a heterogeneous collection of workloads (e.g., long-running services, batch data analytics, interactive development/test). To maximize resource efficiency and utilization, the machines are often aggregated into a resource pool where workloads are consolidated. A cluster scheduler is then tasked with assigning resources to workloads.
The challenge is to assign the right resources to each. One job might be concerned about completion time and run fastest on a machine with a GPU. Another job might be concerned with long-term availability, which is enhanced by running its constituent processes (tasks) on machines attached to separate power distribution units. When machines are homogeneous, such concerns can often be ignored as all choices are equal. When workloads are homogeneous, understanding of their concerns can be hard-coded into the scheduler. But, neither workloads nor resources are homogeneous in modern datacenters.
To maximize the benefits of resource consolidation, consumers must somehow communicate their specific needs/concerns so that the scheduler can make informed decisions. A scheduler that understands the quantitative tradeoffs involved with the constraints set by each job should be able to make better decisions. By translating each workload's distinct concerns (e.g., availabilities, runtimes, response times) to a common metric called utility, tradeoffs between them can be quantified, where higher utility is better. As should be expected, the more information about significant needs/concerns that can be used by the scheduler, the higher the utility.
The TetriSched project introduces a scheduler that accepts resource requests in the form of utility functions. These utility functions use an algebraic language to describe the utility associated with one or more spatial (which machines) and temporal outcome, quantifying the relative value of each acceptable allocation. TetriSched translates the collection of utility functions into a Mixed Integer Linear Program (MILP) problem and solves it to plan a schedule that optimizes overall utility.
Tetrisched fills the constraint handling gap. Most schedulers take an all-or-nothing approach, either treating all constraints as strict requirements or completely ignoring them. Tetrisched recognizes that tradeoffs are inherent in user preferences, providing a flexible constraint scheme that encodes resource preferences and their relative utility.
FACULTY
Greg Ganger
Mor Harchol-Balter
GRAD STUDENTS
Angela Jiang
Junwoo Park
Alexey Tumanov
Timothy Zhu
INDUSTRY COLLABORATORS
Michael A. Kozuch, Intel
We thank the members and companies of the PDL Consortium: Amazon, Datadog, Google, Honda, Intel Corporation, IBM, Jane Street, Meta, Microsoft Research, Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Two Sigma, and Western Digital for their interest, insights, feedback, and support.