Journal of Parallel and Distributed Computing, vol. 17, January 1993, pp. 4-27.
Garth A. Gibson* and David A. Patterson**
School of Computer Science*
Carnegie Mellon University
Pittsburgh, PA 15213
UC Berkeley**
Redundancy based on a parity encoding has been proposed for insuring that disk arrays provide highly reliable data. Parity-based redundancy will tolerate many independent and dependent disk failures (shared support hardware) without on-line spare disks and many more such failures with on-line spare disks. This paper explores the design of reliable, redundant disk arrays. In the context of a 70 disk strawman array, it presents and applies analytic and simulation models for the time until data is lost. It shows how to balance requirements for high data reliability against the overhead cost of redundant data, on-line spares, and on-site repair personnel in terms of an array's architecture, its component reliabilities, and its repair policies.
FULL PAPER: pdf / postscript