Proceedings of the 2019 USENIX Annual Technical Conference, July 10–12, 2019 • Renton, WA.
Jin Kyu Kim1, Abutalib Aghayev1, Garth A. Gibson1,2,3, Eric P. Xing1,4
1 Carnegie Mellon University
2 Vector Institute
3 University of Toronto
4 Petuum, Inc.
It is a daunting task for a data scientist to convert sequential code for a Machine Learning (ML) model, published by an ML researcher, to a distributed framework that runs on a cluster and operates on massive datasets. The process of fitting the sequential code to an appropriate programming model and data abstractions determined by the framework of choice requires significant engineering and cognitive effort. Furthermore, inherent constraints of frameworks sometimes lead to inefficient implementations, delivering suboptimal performance.
We show that it is possible to achieve automatic and efficient distributed parallelization of familiar sequential ML code by making a few mechanical changes to it while hiding the details of concurrency control, data partitioning, task parallelization, and fault-tolerance. To this end, we design and implement a new distributed ML framework, STRADS-Automatic Parallelization (AP), and demonstrate that it simplifies distributed ML programming significantly, while outperforming a popular data-parallel framework with a non-familiar programming model, and achieving performance comparable to an ML-specialized framework.
FULL PAPER: pdf