Cloud service providers (CSPs) offer an effectively infinite (from most customers’ viewpoints) set of VM instances available for rental at fine time granularity. But, figuring out which instances to request is difficult, because each CSP offers diverse VM instance "types" primarily differentiated by their constituent hardware resources (e.g., core counts and memory sizes) and leasing contract models. Two primary types of contract models are reliable and transient (sometimes called "spot" or "preemptable"). Instances leased under a reliable contract are non-preemptible, while instances leased under a transient contract are made available on a best-effort basis at decreased cost and/or at lower priority.
This project explores opportunities to exploit the various CSP VM instance offerings (e.g., best-fit VM sizes or low-cost but unreliable VMs rented on transient contracts) to reduce the costs to run user applications in the cloud, while respecting application performance requirements. Explored applications include task scheduling and resource management for general batch analytics jobs (Stratus), elastic web services (Tributary), and machine learning model training (Proteus).
As an example, AWS's transient VMs leasing model is based on a price market that works as follows. The user specifies a *bid price* to indicate the maximum amount that they are willing to pay to rent a VM for an hour, while they are charged the market price. If the market price goes above the bid price during the VM's rental period, the rented VM is preempted; if the VM is preempted within the first hour of rental, the user may not get charged in some pricing models. This provides opportunities for strategic bidding based on user application reliability requirements. |
FACULTY
STUDENTS
Andrew Chung
Aaron Harlap (alumni)
Jun Woo Park (alumni)
Alexey Tumanov (alumni)
We thank the members and companies of the PDL Consortium: Amazon, Datadog, Google, Honda, Intel Corporation, IBM, Jane Street, Meta, Microsoft Research, Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Two Sigma, and Western Digital for their interest, insights, feedback, and support.