2021 IEEE International Symposium on Information Theory (ISIT 2021) 12-20 July 2021 • Melbourne, Victoria, Australia.
Francisco Maturana, K. V. Rashmi
Carnegie Mellon University
Distributed storage systems typically use erasure codes to provide tolerance against node failures. An erasure code encodes a message into a codeword made up of several symbols, which are then distributed among nodes in the system. Maximum distance separable (MDS) [n; k] scalar codes are commonly used in practice, which have the property that any subset of k out of n nodes is enough to decode the message. However, in applications such as geo-distributed storage systems, decodability from many of these subsets is unnecessary. In this paper, we study codes where only certain subsets of nodes, named access sets, are required to satisfy decodability.
Our analysis focuses on two metrics of practical importance: update cost and storage overhead. For minimizing these metrics, we show that it is necessary to employ irregular array codes. We derive a lower bound on update cost as a function of the required access sets and show that it is achievable. Existing work provides an achievable lower bound on storage overhead. While both lower bounds are individually achievable, we show that they are not simultaneously achievable in general. Due to the premium in wide-area network bandwidth cost over storage cost, we focus on codes with minimum update cost (termed MUC). Finally, we derive a lower bound on the storage overhead of MUC codes and show the existence of MUC codes meeting this lower bound via a randomized construction. Our results thus show that it is possible to achieve significant savings in update cost and storage overhead by tailoring the design of codes to the required access sets.
FULL PAPER: pdf