Carnegie Mellon University Technical Report CMU-CS-96-137, June 1996. Superceded by Proc. of the International Computer Performance and Dependability Symposium (IPDS), Sept. 4-6, 1996.
William V. Courtright II*, Garth A. Gibson, Mark Holland*, Jim Zelenka
Department of Electrical and Computer Engineering*
School of Computer Science,
Carnegie Mellon University
5000 Forbes Avenue, Pittsburgh, PA
Error recovery in redundant disk arrays is typically performed in an ad hoc fashion, requiring architecture-specific code which limits extensibility and is difficult to verify. In this paper, we describe a technique for automating the execution of redundant disk array operations, including recovery from errors, independent of array architecture. Our approach employs a graphical representation of array operations and a two-phase error-recovery scheme we refer to as roll-away error recovery. We demonstrate the validity of this approach in RAIDframe, a prototyping framework that separates architectural policy from execution mechanism. RAIDframe facilitates rapid proto- typing of new RAID architectures by localizing modifications. In addition, RAIDframe-implemented architectures run the same code when configured as an event-driven simulator, a user-level application managing raw disks, and as a Digital Unix device-driver capable of mounting a file system. Evaluation shows that RAIDframe performance is equivalent to less complex array implementations and that case studies of RAID levels 0, 1, 4, 5, 6, and parity declustering achieve expected performance.
FULL PAPER, TR VERSION: pdf / postscript
FULL PAPER, CONFERENCE VERSION: pdf / postscript