STRADS: Model Parameter Scheduling

Contact: Jin Kyu Kim

Machine learning problems with big-data and big-model lead us to parallel computing. Speedup of parallel machine learning depends on how we parallelize ML algorithm and underlying distributed framework performance. In tranditional data-parallel approach, we have used relaxed consistency such as SSP that reduces synchronization cost, mitigates straggler effects, and finally improves iteration throughput. However, it causes inconsisteny on the shared model parameters. The inconsistent shared parameter values might hurt accuracy of computation and increse the number of updates until ML algorithm converges. This inconsistency problem often limits speedup of parallel machine learning in a distributed environment.

In this work, we takes a comprehensive approach that considers machine learning and system at the same time. On the machine learning side, we present SchMP (Scheduled Model Parallelism) scheme that schedules model parameters to avoid concurrent updates of dependent model parameters, which eliminates or reduces inconsistency of shared model parameters. On the system side, STRADS execution engines implement several system optimizations that overlap scheduling, network communication and parameter update computation in order to improves update throughput without hurting progress per update too much. We present two common scheduling schemes: (1) dynamic-scheduling; (2) static-scheduling. Dynamic-scheduling is the by-default scheme that analyzes dependency structure and conducts priority sampling based on changes of parameter values in runtime. On contrary static-scheduling makes a complete schedule plans prior to runtime. In some ML applications, dependency structure shows regularity and it's possible to make a schedule plan before runtime. For these ML applications, static-scheduling could be more beneficial than dynamic-scheduling because it does not incur runtime scheduling overhead. Due to the different characteristics of two scheduling schemes, we implemented two different execution engines: (1) STRADS-Dynamic engine; (2) STRADS-Stratic engine. For performance benchmark, We implement lasso, logistic regression and support vector machine on STRADS-Dynamic, and SGD-MF, CD-MF and LDA on STRADS-Static. We observed that SchMP ML applications running on STRADS engines outperform parallel ML implementations without model parameter scheduling. For instance, SchMP-LDA and SchMP-Lasso achives 10x and 5x faster training speed than well-established baselines.

Convergence time: SchMP-Lasso vs YahooLDA

Convergence time: SchMP Matrix Factorization

Convergence time: SchMP-Lasso vs Shotgun Lasso




Garth Gibson
Eric Xing


Jin Kyu Kim
Xun Zheng
Wei Dai



We thank the members and companies of the PDL Consortium: Broadcom, Ltd., Citadel, Dell EMC, Google, Hewlett-Packard Labs, Hitachi Ltd., Intel Corporation, Microsoft Research, MongoDB, NetApp, Inc., Oracle Corporation, Samsung Information Systems America, Seagate Technology, Tintri, Toshiba, Two Sigma, Uber, Veritas and Western Digital for their interest, insights, feedback, and support.




© 2017. Last updated 12 October, 2016