SOSP ’25, October 13–16, 2025, Seoul, Republic of Korea.
Ziyue Qiu*^, Hojin Park* Jing Zhao^, Yukai Wang^, Arnav Balyan^, Gurmeet Singh^, Yangjun Zhang^, Suqiang (Jack) Song^, Gregory R. Ganger*, George Amvrosiadis*
 *Carnegie Mellon University
                      ^Uber
                    
The deployment of large-scale data analytics between onpremise and cloud sites, i.e., hybrid clouds, requires careful partitioning of both data and computation to avoid massive networking costs. We present Moirai, a cost-optimization framework that analyzes job accesses and data dependencies and optimizes the placement of both in hybrid clouds. Moirai informs the job scheduler of data location and access predictions, so it can determine where jobs should be executed to minimize data transfer costs. Our optimizer achieves scalability and cost efficiency by exploiting recurring jobs to identify data dependencies and job access characteristics and reduces the search space by excluding data not accessed recently.
We validate Moirai using 4-month traces that span 66.7M queries accessing 13.3EB from Presto and Spark clusters deployed at Uber, a multi-national transportation company leveraging large-scale data analytics for its operations. Moirai reduces hybrid cloud deployment costs by over 97% relative to the state-of-the-art partitioning approach from Alibaba and other public approaches. The savings come from 95–99.5% reduction in cloud egress, up to 99% reduction in replication, and 89–98% reduction in on-premises network infrastructure requirements. We also describe concrete steps being taken towards deploying Moirai in production.
KEYWORDS: Hybrid clouds, Cost-efficiency
FULL PAPER: pdf