Ph.D. Dissertation, Dept. of ECE, Carnegie Mellon University, 1994
Mark Holland
School of Computer Science, Carnegie Mellon University
Chapter 1: Introduction 1 Chapter 2: Background Information 7 2.1. The need for improved availability in the storage subsystem 8 2.1.1. The widening access gap 8 2.1.2. The downsizing trend in disk drives 9 2.1.3. The advent of new, I/O intensive applications 10 2.1.4. Why these trends necessitate higher availability 10 2.2. Technology background 13 2.2.1. Disk technology 13 2.2.2. Disk array technology 16 2.2.2.1. Disk array architecture 17 2.2.2.2. Defining the RAID levels: data layout and ECC 18 2.2.2.3. Reading and writing data in the different RAID levels 23 2.2.2.3.1. RAID Level 1 24 2.2.2.3.2. RAID Level 3 25 2.2.2.3.3. RAID Level 5 26 2.2.2.4. Comparing the performance of the RAID levels 29 2.2.2.5. On-line reconstruction 29 2.2.2.6. Related work: variations on these organizations 30 2.2.2.6.1. Multiple failure toleration 30 2.2.2.6.2. Addressing the small-write problem 32 2.2.2.6.3. Spare space organizations 34 2.2.2.6.4. Distributing the functionality of the array controller 35 2.2.2.6.5. Striping studies 35 2.2.2.6.6. Disk array performance evaluation 37 2.2.2.6.7. Reliability modeling 37 2.2.2.6.8. Improving the write-performance of RAID Level 1 39 2.2.2.6.9. Network file systems based on RAID 40 2.3. Evaluation methodology 40 2.3.1. Simulation methodology 40 2.3.2. The raidSim disk array simulator 41 2.3.3. Default workload 42 Chapter 3: Disk Array Architectures and Data Layouts 45 3.1. Related work 45 3.1.1. Availability techniques in mirrored arrays 46 3.1.2. Availability techniques for parity-based arrays 47 3.1.2.1. Multiple independent groups 47 3.1.2.2. Distributing the failure-induced workload 49 3.1.3. Summary 51 3.2. Disk array layouts for parity declustering 52 3.2.1. Layout goodness criteria 52 3.2.2. Layouts based on balanced incomplete block designs 55 3.2.2.1. Block designs 55 3.2.2.2. Deriving a layout from a block design 56 3.2.2.3. Evaluating the layout 58 3.2.2.4. Finding block designs for layout 60 3.2.3. A related study: layout via random permutations 61 3.2.4. Summary 63 3.3. Primary evaluations 63 3.3.1. Comparing declustering to RAID Level 5 65 3.3.1.1. No effect on fault-free performance 65 3.3.1.2. Declustering greatly benefits degraded-mode performance 65 3.3.1.3. Declustering benefits persist during reconstruction 66 3.3.1.4. Declustering also benefits data reliability 68 3.3.1.5. Summary 70 3.3.2. Varying the declustering ratio 70 3.3.2.1. Fault-free performance 71 3.3.2.2. Degraded- and reconstruction-mode performance 72 3.3.2.3. High data reliability 74 3.3.2.4. Summary 74 3.3.3. Improving fault-free performance by increasing disk utilization 75 3.3.3.1. A simple model for the load-increase factor 75 3.3.3.2. Verification via simulation 79 3.3.3.3. Summary 80 3.3.4. Performance on non-OLTP workloads 81 3.3.4.1. Performance versus user access size and read-write ratio 81 3.3.4.2. Performance versus locality of reference 85 3.3.4.3. Performance on specific workloads 86 3.3.4.4. Summary of non-OLTP performance evaluations 91 3.3.5. Overall performance evaluation summary 91 3.4. System configuration 92 3.5. Optimizations and improvements 99 3.5.1. Optimizing the reconstruction unit size 99 3.5.1.1. Layout modification 100 3.5.1.2. Evaluating the benefits of larger reconstruction units 102 3.5.1.3. Determining the optimal reconstruction unit 104 3.5.2. Compacting the full block design table 105 3.5.2.1. Balancing parity in the minimum number of block design tables 106 3.5.2.2. Reordering algorithm 107 3.5.2.3. Compaction results and conclusions 109 3.5.3. Improving adherence to criterion six 112 3.5.3.1. The need for optimization 112 3.5.3.2. Optimizing the designs 114 3.5.3.3. Results 116 3.5.3.4. Conclusions 120 3.6. Conclusions 120 Chapter 4: Reconstruction Algorithms 123 4.1. Prior work 123 4.2. Stripe-oriented and disk-oriented reconstruction 124 4.2.1. Stripe-oriented reconstruction and its parallelized version 124 4.2.2. Disk-oriented reconstruction 126 4.2.3. Implementation of disk-oriented reconstruction 127 4.2.3.1. Buffer memory management 127 4.2.3.2. Interaction with writes in the normal workload 129 4.2.4. Summary 130 4.3. Performance evaluations 131 4.3.1. Comparing reconstruction algorithms 131 4.3.2. Sensitivity of disk-oriented algorithm to available buffer memory 133 4.3.3. Comparing memory requirements between algorithms 134 4.4. Optimizations and improvements 135 4.4.1. Work reducing variations to reconstruction algorithms 135 4.4.1.1. Defining the variations 136 4.4.1.2. Evaluating the options on OLTP-like workloads 137 4.4.1.3. Dynamic use of reconstruction options 140 4.4.1.4. Summary 141 4.4.2. Head following 141 4.4.2.1. Basic head-following algorithm, and its shortcomings 142 4.4.2.2. First approach: fetch closest active parity stripe 144 4.4.2.3. Second approach: multiple reconstruction points 148 4.4.2.4. Summarizing: head following is not viable under a random workload 152 4.4.2.5. Evaluating head following on other workloads 153 4.4.2.6. Summary 154 4.5. Conclusions 155 Chapter 5: Distributed Sparing 157 5.1. The benefits of distributed sparing 158 5.2. Distributed sparing and its implications on failure recovery 159 5.3. Implementation in declustered parity arrays 160 5.4. Disk-oriented reconstruction algorithm to support distributed sparing 169 5.5. Performance analysis 171 5.5.1. Reconstruction performance 171 5.5.2. Ultra-fast reconstruction in large arrays 172 5.5.3. Large-access performance in reconfigured mode 175 5.5.4. Re-evaluating the reconstruction options under distributed sparing 177 5.6. A related study 179 5.7. Conclusions 181 Chapter 6: Conclusions 183 References 193 Appendix A: Data Mapping Algorithms 201 A.1. Preliminaries 201 A.2. Mapping code for declustered parity 202 A.2.1. Data structures 202 A.2.2. MapSector 205 A.2.3. MapParity 206 A.2.4. MapPhysicalToStripeID 207 A.3. Mapping code for declustered parity with distributed sparing 208 A.3.1. MapSector 210 A.3.2. MapParity 211 A.3.3. MapPhysicalToStripeID 212 A.3.4. remap_to_spare_space 214 A.4. adjust_params 215 Appendix B: Block Designs 217 B.1. Block designs on v > 43 217 B.2. Designs in the database 217 B.3. Block designs used in the simulations 218 B.3.1. Designs on v = 40 220 B.3.2. Designs on k = 4 221 Appendix C: Simulation and Model Data 225