Jack Kosaian, K.V. Rashmi, Shivaram Venkataraman
Carnegie Mellon University
Recent advances have shown the potential for coded computation to impart resilience against slowdowns and failures that occur in distributed computing systems. However, existing coded computation approaches are either unable to support non-linear computations, or can only support a limited subset of non-linear computations while requiring high resource overhead. In this work, we propose a learning-based coded computation framework to overcome the challenges of performing coded computation for general non-linear functions. We show that careful use of machine learning within the coded computation framework can extend the reach of coded computation to imparting resilience to more general non-linear computations. We showcase the applicability of learning-based coded computation to neural network inference, a major workload in production services. Our evaluation results show that learning-based coded computation enables accurate reconstruction of unavailable results from widely deployed neural networks for a variety of inference tasks such as image classification, speech recognition, and object localization. We implement our proposed approach atop an open-source prediction serving system and show its promise in alleviating slowdowns that occur in neural network inference. These results indicate the potential for learningbased approaches to open new doors for the use of coded computation for broader, non-linear computations.
FULL PAPER: pdf