Carnegie Mellon University Technical Report CMU-CS-94-193, September 1994. Superceded by Proceedings of the 1994 Computer Measurement Group (CMG) Conference, Orlando FL, Vol. 1, December 4-9, 1994, pp. 63-74.
William V. Courtright II and Garth A. Gibson
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Redundant disk arrays are single fault tolerant, incorporating a layer of error handling not found in non-redunant disk systems. Recovery from these errors is complex, due in part to the large number of erroneous states the system may reach. The established approach to error recovery in disk systems is to transition directly from an erroneous state to completion. This technique, known as forward error recovery, relies upon the context in which an error occurs to determine the steps required to reach completion, which implies forward error recovery is design specific. Forward error recovery requires the enumeration of all erroneous states the system may reach and the construction of a forward path from each erroneous state. We propose a method of error recovery which does not rely upon the enumeration of erroneous states or the context in which they occur. When an error is encountered, we advocate mechanized recovery to an error-free state from which an operation may be retried. Using a form a backward error recovery, we are able to manage the complexity of error recovery in redundant disk arrays without sacrificing performance.
FULL PAPER, TR VERSION: pdf / postscript
FULL PAPER, CONFERENCE VERSION: pdf / postscript