PDL Abstract
ANNOTATED BIBLIOGRAPHY:
RAID: High-Performance, Reliable Secondary Storage
- [Amdahl67] Gene M. Amdahl. Validity of the single processor approach
to achieving large scale computing capabilities. In Proceedings AFIPS
1967 Spring Joint Computer Conference, volume 30, pages 483-485, April
1967.
Three page paper that eloquently gives case for traditional computers
by pointing out that performance improvement is limited by portion
of the computation that is not improved.
- [Baccelli85] Francois Baccelli. Two Parallel Queues Created by
Arrivals with Two Demands. Technical Report 426, INRIA-Rocquencourt
France, 1985.
Derives an exact solution for the two-server, M/G/1 fork-join queue.
- [Bhide92] Anupam Bhide and Daniel Dias. Raid Architectures for
OLTP. Technical Report RC 17879 (#78489), IBM, March 1992.
Increases throughput for workloads emphasizing small, random write
accesses in a redundant disk array by logging changes to parity for
efficient application later. Parity changes are logged onto a separate
disk which must be externally sorted before application to the disk
array's parity.
- [Bitton88] Dina Bitton and Jim Gray. Disk Shadowing. In Very Large
Database Conference XIV, pages 331-338, 1988.
Describes disk mirroring and derives an analytical equation for read
and write seek distances as a function of the number of data copies.
- [Burkhard93] W. Burkhard and J. Menon. Disk Array Storage System
Reliability. In 23rd Annual International Symposium on Fault-Tolerant
Computing, pages 432-441, June 1993.
Argues need for multiple-error-correcting disk arrays, discusses how
to use maximal distance separable codes to protect against multiple
errors in a space-efficient manner.
- [Buzen86] Jeffrey P. Buzen and Annie W.C. Shum. I/O Architecture
in MVS/370 and MVS/XA. CMG Transactions, 54:19-26, Fall 1986.
Overview of the MVS/370 and MVS/XA I/O architecture. Describes channel
paths, storage directors, string controllers, rotational position
sensing, static and dynamic reconnect.
- [Cao93] Pei Cao, Swee Boon Lim, Shivakumar Venkataraman, and John
Wilkes. The TickerTAIP parallel RAID architecture. In Proceedings
of the 1993 International Symposium on Computer Architecture, May
1993.
Describes the TickerTAIP architecture, software implementation issues,
and the performance of different methods of distributing parity computation
among multiple processors.
- [Chandy93] John Chandy and A. L. Narasimha Reddy. Failure Evaluation
of Disk Array Organizations. In Proceedings of the International Conference
on Distributed Computing Systems, May 1993.
Contrasts four previously described schemes for minimizing data reconstruction
time in small (7 and 16 disks) redundant disk arrays: RAID 5 with
a single spare disk, RAID 5 with a single spare whose space is distributed
across all disks, a special case of Muntz and Lui's parity clustering
organization, and a method of dynamically converting a redundant data
disk to a spare disk by merging two redundancy groups into one larger
group. The second, distributed sparing, is generally preferred because
of its performance and simplicity, but the Muntz scheme is better
for minimal impact of user performance during recovery.
- [Chen90a] Peter M. Chen, Garth A. Gibson, Randy H. Katz, and David
A. Patterson. An Evaluation of Redundant Arrays of Disks Using an
Amdahl 5890. In Proceedings of the 1990 ACM SIGMETRICS Conference
on Measurement and Modeling of Computer Systems, May 1990.
The first experimental evaluation of RAID. Compares RAID levels 0,
1, and 5.
- [Chen90b] Peter M. Chen and David A. Patterson. Maximizing Performance
in a Striped Disk Array. In Proceedings of the 1990 International
Symposium on Computer Architecture, pages 322-331, May 1990.
Discusses how to choose the striping unit for a RAID level 0 disk
array.
- [Chen91] Shenze Chen and Don Towsley. A Queueing Analysis of RAID
Architectures. Technical Report COINS Tech. Report 91-71, University
of Massachusetts, Amherst, Department of Computer and Information
Science, 1991.
Analytically models RAID level 1 and RAID level 5 disk arrays to compare
their performance on small and large requests. Bounds are used to
model the queueing and fork-join synchronization in RAID level 1 disk
arrays. Small write requests in RAID level 5 disk arrays are handled
by ignoring the fork-join synchronization overhead. Large requests
are modeled by using a single queue for all the disks in the disk
array.
- [Chen93] Peter M. Chen and Edward K. Lee. Striping in a RAID Level
5 Disk Array. Technical Report CSE-TR-181-93, University of Michigan,
November 1993.
Discusses how to choose striping unit for RAID Level 5 disk arrays.
Quantifies effect of writes and varying number of disks.
- [Chen94] Peter M. Chen, Edward K. Lee, Ann L. Drapeau, Ken Lutz,
Ethan L. Miller, Srinivasan Seshan, Ken Shirriff, David A. Patterson,
and Randy H. Katz. Performance and Design Evaluation of the RAID-II
Storage Server. Invited to the Journal of Distributed and Parallel
Databases, to appear, 1994. also appeared in The 1993 International
Parallel Processing Symposium Workshop on I/O in Parallel Computer
Systems.
Summarizes major architectural features of RAID-II and evaluates how
they impacts performance.
- [Chervenak91] Ann L. Chervenak and Randy H. Katz. Performance of
a Disk Array Prototype. In Proceedings of the 1991 ACM SIGMETRICS
Conference on Measurement and Modeling of Computer Systems, volume
19, pages 188-197, May 1991. Performance Evaluation Review.
Evaluates the performance of RAID-I, a U.C. Berkeley disk array prototype.
- [Copeland88] G. Copeland, W. Alexander, E. Boughter, and T. Keller.
Data Placement in Bubba. In Proceedings of the ACM SIGMOD International
Conference on Management of Data, pages 99-108, 1988.
Discusses data allocation in a large database.
- [Drapeau94] Ann L. Drapeau, Ken Shirriff, Edward K. Lee, Peter
M. Chen, Garth A. Gibson, John H. Hartman, Ethan L. Miller, Srinivasan
Seshan, Randy H. Katz, Ken Lutz, and David A. Patterson. RAID-II:
A High-Bandwidth Network File Server. In Proceedings of the 1994 International
Symposium on Computer Architecture, April 1994.
Describes the architecture, file system, and performance of RAID-II,
a disk-array file server prototype.
- [Emlich89] Larry W. Emlich and Herman D. Polich. VAXsimPLUS, A
Fault Manager Implementation. Digital Technical Journal, 8, February
1989.
Describes Digital Equipment Corporation's tool for predicting an avoiding
disk failures.
- [Flatto84] L. Flatto and S. Hahn. Two Parallel Queues Created by
Arrivals with Two Demands I}. SIAM Journal of Computing, pages 1041-1053,
October 1984.
Derives an exact solution for the two server, M/M/1, fork-join queue.
- [Friedman83] Mark B. Friedman. DASD Access Patterns. In 14th International
Conference on Management and Performance Evaluation of Computer Systems,
pages 51-61, 1983. CMG XIV.
Looks at how much disk accesses are skewed towards particular disks
in several transaction processing sites.
- [Gibson91] Garth Alan Gibson. Redundant Disk Arrays: Reliable,
Parallel Secondary Storage. PhD thesis, University of California at
Berkeley, December 1991. also available from MIT Press, 1992.
Award winning dissertation that describes RAIDs in detail, with emphasis
on reliability analysis of several alternatives.
- [Gibson92] Garth A. Gibson, R. Hugo Patterson, and M. Satyanarayanan.
Disk Reads with DRAM Latency. Third Workshop on Workstation Operating
Systems, April 1992.
Proposes that applications give hints about their future file accesses
so that the buffer cache can prefetch needed data and provide low-latency
file access. The hints could also be exploited to improve cache management
and disk scheduling.
Abstract / Postscript
- [Gray90] Jim Gray, Bob Horst, and Mark Walker. Parity Striping
of Disc Arrays: Low-Cost Reliable Storage with Acceptable Throughput.
In Proceedings of the 16th Very Large Database Conference, pages 148-160,
1990. VLDB XVI.
Describes an data and parity layout for disk arrays called parity
striping. Parity striping is essentially RAID level 5 with an infinite
striping unit and manual load balancing.
- [Hall86] M. Hall. Combinatorial Theory (2nd Edition). Wiley-Interscience,
1986.
Textbook on combinatorial theory. The section on balanced, incomplete
block designs are most relevant for readers of this paper.
- [Heidelberger82] Philip Heidelberger and Kishor S. Trivedi. Queueing
Network Models for Parallel Processing with Asynchronous Tasks. IEEE
Transactions on Computers, C-31(11):1099-1109, November 1982.
Derives approximate solutions for queueing systems with forks but
no joins.
- [Hennessy90] John L. Hennessy and David A. Patterson. Computer
Architecture: A Quantitative Approach. Morgan Kaufmann Publishers,
Inc., 1990.
Likely the most popular general book in computer architecture today,
the discussion on technology trends, general I/O issues, and measurements
of seek distances are most relevant to readers of this paper.
- [Holland92] Mark Holland and Garth A. Gibson. Parity Declustering
for Continuous Operation in Redundant Disk Arrays. In Proceedings
of the 5th International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS-V), pages 23-35, October 1992.
Describes parity declustering, a technique for improving the performance
of a redundant disk array in the presence of disk failure. Analyzes
the proposed solution using detailed simulation and finds significant
improvements (20-50%) in both user response time and reconstruction
time. Also analyzes a set of previously-proposed optimizations that
can be applied to the reconstruction algorithm, concluding that they
can actually slow the reconstruction process under certain conditions.
Abstract / Postscript
- [Holland93] Mark Holland, Garth A. Gibson, and Daniel Siewiorek. Fast,
On-Line Failure Recovery in Redundant Disk Arrays. In Proceedings
of the 23rd International Symposium on Fault Tolerant Computing, 1993.
FTCC-23.
Compares and contrasts two data reconstruction algorithms for disk
arrays: "parallel stripe-oriented reconstruction" and "disk-oriented
reconstruction". Presents an implementation of the disk-oriented algorithm
and analyzes reconstruction performance of these algorithms, concluding
that the disk oriented algorithm is superior. Investigates the sensitivity
of the reconstruction process to the size of the reconstruction unit
and the amount of memory available for reconstruction.
Abstract / Postscript
- [Hsiao90] H. Hsiao and D. DeWitt. Chained Declustering: A New Availability
Strategy for Multiprocessor Database Machines. In Proceedings of the
1990 IEEE International Conference on Data Engineering, pages 456-465,
February 1990.
Introduces a variation of mirroring, where the secondary copy of data
is distributed across the disks in a different manner than the primary
copy of data.
- [Katz92] Randy H. Katz. High Performance Network and Channel-Based
Storage. Proceedings of the IEEE, 80(8):1238-1261, August 1992.
Presents overview of network-based storage systems. Reviews hardware
and software trends in storage systems.
- [Katz93] Randy H. Katz, Peter M. Chen, Ann L. Drapeau, Edward K.
Lee, Ken Lutz, Ethan L. Miller, Srinivasan Seshan, and David A. Patterson.
RAID-II: Design and Implementation of a Large Scale Disk Array Controller.
In 1993 Symposium on Integrated Systems, 1993. University of California
at Berkeley UCB/CSD 92/705.
Describes the design decisions and implementation experiences from
RAID-II.
- [Kim86] Michelle Y. Kim. Synchronized Disk Interleaving. IEEE Transactions
on Computers, C-35(11):978-988, November 1986.
Simulates the performance of independent disks versus synchronized
disk striping. Derives an equation for response time by treating the
synchronized disk array as an M/G/1 queueing system.
- [Kim91] Michelle Y. Kim and Asser N. Tantawi. Asynchronous Disk
Interleaving: Approximating Access Delays. IEEE Transactions on Computers,
40(7):801-810, July 1991.
Derives an approximate equation for access time in unsynchronized
disk arrays when seeks times are exponentially distributed and rotational
latency is uniformly distributed.
- [Korner90] K. Korner. Intelligent Caching for Remote File Service.
In Proceedings of the International Conference on Distributed Computing
Systems, pages 220-226, 1990.
Uses traces to generate hints based on the program running and the
directory and name of files accessed. The file server uses the hints
to pick a caching algorithm: LRU, MRU, none. Simulation showed significant
benefits from intelligent caching but not from read-ahead which delayed
demand requests since it was not preemptable.
- [Kotz91] David Kotz and Carla Schlatter Ellis. Practical Prefetching
Techniques for Parallel File Systems. In Proceedings of the First
International Conference on Parallel and Distributed Information Systems,
pages 182-189, December 1991.
File access predictors use past accesses to prefetch data in idle
nodes of a parallel file system. Simulation studies show that practical
predictors can often significantly reduce total execution time while
the penalty for incorrect predictions is modest.
- [Lee91a] Edward K. Lee and Randy H. Katz. An Analytic Performance
Model of Disk Arrays and its Applications. Technical Report UCB/CSD
91/660, University of California at Berkeley, 1991.
Derives an analytic model for non-redundant disk arrays and uses the
model to derive an equation for the optimal size of data striping.
- [Lee91b] Edward K. Lee and Randy H. Katz. Performance Consequences
of Parity Placement in Disk Arrays. In Proceedings of the 4rd International
Conference on Architectural Support for Programming Languages and
Operating Systems (ASPLOS-IV), pages 190-199, April 1991.
Investigates the performance of different methods of distributing
parity in RAID level 5 disk arrays.
- [Lee93] Edward K. Lee and Randy H. Katz. An Analytic Performance
Model of Disk Arrays. In Proceedings of the 1993 ACM SIGMETRICS Conference
on Measurement and Modeling of Computer Systems, pages 98-109, May
1993.
Similar to earlier technical report with similar name except with
better empirical justifications and a more detailed study of the model's
properties.
- [Livny87] Miron Livny, Setrag Khoshafian, and Haran Boral. Multi-Disk
Management Algorithms. In Proceedings of the 1987 ACM SIGMETRICS Conference
on Measurement and Modeling of Computer Systems, pages 69-77, May
1987.
Compares performance of disk arrays with track-sized and infinite
striping units. Concludes that striping can improve performance for
many multi-disk systems.
- [LoVerso93] Susan J. LoVerso, Marshall Isman, Andy Nanopoulos,
et al. sfs: A Parallel File System for the CM-5. In Proceedings USENIX
Summer Conference, June 1993.
A description of the I/O hardware and the file system of the massively
parallel processor from Thinking Machines. Their RAID-3 disk array
has excellent performance for large file accesses.
- [Malhotra93] Manish Malhotra and Kishor S. Trivedi. Reliability
Analysis of Redundant Arrays of Inexpensive Disks. Journal of Parallel
and Distributed Computing, 17:146-151, January 1993.
Uses Markov models to derive exact, closed-form reliability equations
for redundant disk arrays. Analysis accounts for failure prediction
and sparing.
- [Menon91] Jai Menon, Dick Mattson, and Spencer Ng. Distributed
Sparing for Improved Performance of Disk Arrays. Technical Report
RJ 7943, IBM, January 1991.
Explores the use of an on-line spare disk in a redundant disk array
analytically. It examines multiple configurations, but fundamentally
it distributes the spare's space over the whole array so that every
disk is N/(N+2) data, 1/(N+2) parity, and 1/(N+2) spare. This gives
an extra 1/(N+2) performance, but, more significantly, it distributes
the recovery-write load (the reconstructed data) over all disks to
shorten recovery time. The benefits, not surprisingly, are largest
for small arrays.
- [Menon93a] Jai Menon and Jim Cortney. The Architecture of a Fault-Tolerant
Cached RAID Controller. In Proceedings of the 20th International Symposium
on Computer Architecture, pages 76-86, May 1993.
Describes the architecture of Hagar and several algorithms for asynchronous
writes that reduce susceptibility to data loss.
- [Menon93b] Jai Menon, James Roche, and Jim Kasson. Floating Parity
and Data Disk Arrays. Journal of Parallel and Distributed Computing,
17:129-139, 1993.
Introduces floating data and floating parity as an optimization for
RAID level 5 disk arrays. Discusses performance and capacity overheads
of methods.
- [Merchant92] A. Merchant and P. Yu. Design and Modeling of Clustered
RAID. In Proceedings of the International Symposium on Fault Tolerant
Computing, pages 140-149, 1992. FTCC.
Presents an implementation of parity declustering, which the authors
call "clustered RAID", based on random permutations. Its advantage
is that it is able to derive a data mapping for any size disk array
with any size parity stripe, and the corresponding disadvantage is
that the computational requirements of the mapping algorithm are high
compared to the block-design based approaches. Analyzes response time
and reconstruction time using this technique via an analytic model,
and finds substantial benefits in both.
- [Mon91] RAID: A Technology Poised for Explosive Growth. Technical
Report DJIA: 2902, Montgomery Securities, December 1991.
Industry projections of market growth for RAID systems from 1990 to
1995.
- [Muntz90] Richard R. Muntz and John C. S. Lui. Performance Analysis
of Disk Arrays under Failure. In Proceedings of the 16th Conference
on Very Large Data Bases, 1990. VLDB XVI.
Proposes and evaluates the "clustered RAID" technique for improving
the failure-recovery performance in redundant disk arrays. It leaves
open the problem of implementation: no technique for efficiently mapping
data units to physical disks is presented. Analyzes via an analytical
model the technique and two potential "optimizations" to the reconstruction
algorithm, and finds significant benefits to all three.
- [Nelson88] R. Nelson and A.N. Tantawi. Approximate Analysis of
Fork/Join Synchronization in Parallel Queues. IEEE Transactions on
Computers, 37(6):739-743, June 1988.
Approximates response time in fork-join queueing systems with k >=
2 servers where each logical request always forks into k requests.
- [Ng92] Spencer Ng and Dick Mattson. Maintaining Good Performance
in Disk Arrays During Failure via Uniform Parity Group Distribution.
In Proceedings of the First International Symposium on High Performance
Distributed Computing, pages 260-269, 1992.
Uses balanced, incomplete block designs to distribute the extra load
from a failed disk equally among other disks in the array.
- [Ng94] Spencer Ng. Crossbar Disk Array for Improved Reliability
and Performance. In Proceedings of the 1994 International Symposium
on Computer Architecture, April 1994.
Introduces schemes to protect against multiple failures of disk support
hardware such as disk controllers and strings.
- [Orji93] Cyril U. Orji and Jon A. Solworth. Doubly Distorted Mirrors.
In Proceedings of the ACM SIGMOD International Conference on Management
of Data, 1993.
Describes a technique called distorted mirrors that partitions each
of two mirrored disks into two halves, one of which lays out the data
in a standard fashion, one of which "distorts" the data layout. This
accelerates writes to the distorted copy while preserving the ability
to sequentially read large files.
- [Patterson88] David A. Patterson, Garth A. Gibson, and Randy H. Katz.
A Case for Redundant Arrays of Inexpensive Disks (RAID). In International
Conference on Management of Data (SIGMOD), pages 109-116, June 1988.
The first published Berkeley paper on RAIDs, it gives all the RAID
nomenclature.
- [Patterson93] R. Hugo Patterson, Garth A. Gibson, and M. Satyanarayanan.
A Status Report on Research in Transparent Informed Prefetching. ACM
Operating Systems Review, 27(2):21-34, April 1993.
Expands on using application hints for file prefetching in [Gibson92].
Hints should disclose access patterns, not advise caching/prefetching
actions. Greatest potential from converting serial accesses into concurrent
accesses on a disk array. Presents preliminary results of user-level
prefetching tests.
Abstract / Postscript
- [Patterson94] David A. Patterson and John L. Hennessy. Computer
Organization and Design: The Hardware/Software Interface. Morgan Kaufmann
Publishers, 1994.
A popular undergraduate book in computer architecture, the discussion
on technology trends are most relevant to readers of this paper.
- [Peterson72] W. Wesley Peterson and E. J. Weldon. Error-Correcting
Codes, Second Edition. MIT Press, 1972.
A general textbook on the mathematics of error-correcting codes.
- [Rosenblum91] Mendel Rosenblum and John K. Ousterhout. The Design
and Implementation of a Log-Structured File System. In Proceedings
of the 13th ACM Symposium on Operating Systems Principles, October
1991.
Describes a Log-Structured File System that makes all writes to disk
sequential. Discusses efficient ways to clean the disk to prevent
excessive fragmentation.
- [Salem86] Kenneth Salem and Hector Garcia-Molina. Disk Striping.
In Proceedings of the Second International Conference on Data Engineering,
pages 336-342, 1986.
Early paper discussing disk striping.
- [Scheuermann91] Peter Scheuermann, Gerhard Weikum, and Peter Zabback.
Automatic Tuning of Data Placement and Load Balancing in Disk Arrays.
Database Systems for Next-Generation Applications: Principles and
Practice, 1991. DBS-92-91.
Describes heuristics for allocating files to disks to minimize disk
skew.
- [Schulze89] Martin Schulze, Garth A. Gibson, Randy Katz, and David
Patterson. How Reliable is a RAID? In Procedures of the IEEE Computer
Society International Conference (COMPCON), March 1989. Spring COMPCON
89.
Gives a reliability calculation for the electronics as well as the
disks for RAIDs.
- [Seltzer90] Margo I. Seltzer, Peter M. Chen, and John K. Ousterhout.
Disk Scheduling Revisited. In Proceedings of the Winter 1990 USENIX
Technical Conference, pages 313-324, January 1990.
Re-examines the problem of how to efficiently schedule a large number
of disk accesses when accounting for both seek and rotational positioning
delays.
- [Stodolsky93] Daniel Stodolsky and Garth A. Gibson. Parity Logging:
Overcoming the Small Write Problem in Redundant Disk Arrays. In Proceedings
of the 1993 International Symposium on Computer Architecture, May
1993.
Increases throughput for workloads emphasizing small, random write
accesses in a redundant disk array by logging changes to the parity
in a segmented log for efficient application later. Log segmentation
allows log operations that are large enough to be efficient yet small
enough to allow in-memory application of a log segment.
Abstract / Postscript
- [Tait91] C. D. Tait and D. Duchamp. Detection and Exploitation
of File Working Sets. In Proceedings of the International Conference
on Distributed Computing Systems, pages 2-9, May 1991.
Dynamically builds and maintains program and data access trees to
predict future file accesses. The current pattern is matched with
previous trees to prefetch data and manage the local cache in a distributed
file system. Trace-driven simulation shows reduced cache miss rates
over a simple LRU algorithm.
- [Weikum92] Gerhard Weikum and Peter Zabback. Tuning of Striping
Units in Disk-Array-Based File Systems. In Proceedings of the 2nd
International Workshop on Research Issues on Data Engineering: Transaction
and Query Processing, pages 80-87, 1992.
Proposes file-specific striping units instead of a single, global
one for all files.
- [Wilmot89] Richard B. Wilmot. File Usage Patterns from SMF Data:
Highly Skewed Usage. In 20th International Conference on Management
and Performance Evaluation of Computer Systems, pages 668-677, 1989.
CMG 1989.
Reports on how files are accessed on four large data centers and finds
that a small number of files account for most of all disk I/O.