Date: October 20, 1994

Speaker: Bill Courtright

Error Recovery in Redundant Disk Arrays

Abstract:
Redundant disk arrays are single fault tolerant, incorporating a layer of error handling not found in nonredundant disk systems. Recovery from these errors is complex, due in part to the large number of erroneous states the system may reach. The established approach to error recovery in disk systems is to transition directly from an erroneous state to completion. This technique, known as forward error recovery, relies upon the context in which an error occurs to determine the steps required to reach completion, which implies forward error recovery is design specific. Forward error recovery requires the enumeration of all erroneous states the system may reach and the construction of a forward path from each erroneous state. We propose a method of error recovery which does not rely upon the context in which errors occur. When an error is encountered, we advocate mechanized recovery to an error-free state from which an operation may be retried. Using a form of backward error recovery, we are able to manage the complexity of error recovery in redundant disk arrays without sacrificing performance.

SDI / LCS Seminar Questions?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/