PARALLEL DATA LAB 

PDL Abstract

On-Line Data Reconstruction In Redundant Disk Arrays

Ph.D. Dissertation, Dept. of ECE, Carnegie Mellon University, 1994

Mark Holland

School of Computer Science, Carnegie Mellon University

Thesis Table of Contents

Chapter 1: Introduction      					 					 1

Chapter 2: Background Information				 					 7
	2.1. The need for improved availability in the storage subsystem  	 8
	2.1.1. The widening access gap					 					 8
	2.1.2. The downsizing trend in disk drives						     9
	2.1.3. The advent of new, I/O intensive applications			     10
	2.1.4. Why these trends necessitate higher availability			     10
	2.2. Technology background											 13
	2.2.1. Disk technology												 13
	2.2.2. Disk array technology					 					 16
	2.2.2.1. Disk array architecture									 17
	2.2.2.2. Defining the RAID levels: data layout and ECC				 18
	2.2.2.3. Reading and writing data in the different RAID levels		 23
	2.2.2.3.1. RAID Level 1												 24
	2.2.2.3.2. RAID Level 3												 25
	2.2.2.3.3. RAID Level 5												 26
	2.2.2.4. Comparing the performance of the RAID levels			     29
	2.2.2.5. On-line reconstruction										 29
	2.2.2.6. Related work: variations on these organizations		 	 30
	2.2.2.6.1. Multiple failure toleration								 30
	2.2.2.6.2. Addressing the small-write problem						 32
	2.2.2.6.3. Spare space organizations				 				 34
	2.2.2.6.4. Distributing the functionality of the array controller	 35
	2.2.2.6.5. Striping studies					 						 35
	2.2.2.6.6. Disk array performance evaluation						 37
	2.2.2.6.7. Reliability modeling										 37
	2.2.2.6.8. Improving the write-performance of RAID Level 1			 39
	2.2.2.6.9. Network file systems based on RAID						 40
	2.3. Evaluation methodology											 40
	2.3.1. Simulation methodology										 40
	2.3.2. The raidSim disk array simulator								 41
	2.3.3. Default workload												 42
	
Chapter 3: Disk Array Architectures and Data Layouts					 45
	3.1. Related work													 45
	3.1.1. Availability techniques in mirrored arrays					 46
	3.1.2. Availability techniques for parity-based arrays				 47
	3.1.2.1. Multiple independent groups								 47
	3.1.2.2. Distributing the failure-induced workload					 49
	3.1.3. Summary														 51
	3.2. Disk array layouts for parity declustering						 52
	3.2.1. Layout goodness criteria										 52
	3.2.2. Layouts based on balanced incomplete block designs			 55
	3.2.2.1. Block designs												 55
	3.2.2.2. Deriving a layout from a block design						 56
	3.2.2.3. Evaluating the layout										 58
	3.2.2.4. Finding block designs for layout							 60
	3.2.3. A related study: layout via random permutations				 61
	3.2.4. Summary														 63
	3.3. Primary evaluations											 63
	3.3.1. Comparing declustering to RAID Level 5						 65
	3.3.1.1. No effect on fault-free performance						 65
	3.3.1.2. Declustering greatly benefits degraded-mode performance 	 65
	3.3.1.3. Declustering benefits persist during reconstruction		 66
	3.3.1.4. Declustering also benefits data reliability		 		 68
	3.3.1.5. Summary													 70
	3.3.2. Varying the declustering ratio								 70
	3.3.2.1. Fault-free performance										 71
	3.3.2.2. Degraded- and reconstruction-mode performance				 72
	3.3.2.3. High data reliability										 74
	3.3.2.4. Summary													 74
	3.3.3. Improving fault-free performance by increasing disk utilization 75
	3.3.3.1. A simple model for the load-increase factor				 75
	3.3.3.2. Verification via simulation								 79
	3.3.3.3. Summary													 80
	3.3.4. Performance on non-OLTP workloads							 81
	3.3.4.1. Performance versus user access size and read-write ratio 	 81
	3.3.4.2. Performance versus locality of reference					 85
	3.3.4.3. Performance on specific workloads							 86
	3.3.4.4. Summary of non-OLTP performance evaluations				 91
	3.3.5. Overall performance evaluation summary						 91
	3.4. System configuration											 92
	3.5. Optimizations and improvements									 99
	3.5.1. Optimizing the reconstruction unit size						 99
	3.5.1.1. Layout modification										100
	3.5.1.2. Evaluating the benefits of larger reconstruction units		102
	3.5.1.3. Determining the optimal reconstruction unit				104
	3.5.2. Compacting the full block design table						105
	3.5.2.1. Balancing parity in the minimum number of block design tables 106
	3.5.2.2. Reordering algorithm										107
	3.5.2.3. Compaction results and conclusions							109
	3.5.3. Improving adherence to criterion six							112
	3.5.3.1. The need for optimization									112
	3.5.3.2. Optimizing the designs										114
	3.5.3.3. Results													116
	3.5.3.4. Conclusions												120
	3.6. Conclusions													120

Chapter 4: Reconstruction Algorithms									123
	4.1. Prior work														123
	4.2. Stripe-oriented and disk-oriented reconstruction				124
	4.2.1. Stripe-oriented reconstruction and its parallelized version  124
	4.2.2. Disk-oriented reconstruction									126
	4.2.3. Implementation of disk-oriented reconstruction				127
	4.2.3.1. Buffer memory management									127
	4.2.3.2. Interaction with writes in the normal workload				129
	4.2.4. Summary														130
	4.3. Performance evaluations										131
	4.3.1. Comparing reconstruction algorithms							131
	4.3.2. Sensitivity of disk-oriented algorithm to available buffer memory 133
	4.3.3. Comparing memory requirements between algorithms				134
	4.4. Optimizations and improvements									135
	4.4.1. Work reducing variations to reconstruction algorithms		135
	4.4.1.1. Defining the variations									136
	4.4.1.2. Evaluating the options on OLTP-like workloads				137
	4.4.1.3. Dynamic use of reconstruction options						140
	4.4.1.4. Summary													141
	4.4.2. Head following												141
	4.4.2.1. Basic head-following algorithm, and its shortcomings		142
	4.4.2.2. First approach: fetch closest active parity stripe			144
	4.4.2.3. Second approach: multiple reconstruction points			148
	4.4.2.4. Summarizing: head following is not viable under a random workload 152
	4.4.2.5. Evaluating head following on other workloads				153
	4.4.2.6. Summary													154
	4.5. Conclusions													155
Chapter 5: Distributed Sparing										157
	5.1. The benefits of distributed sparing							158
	5.2. Distributed sparing and its implications on failure recovery 	159
	5.3. Implementation in declustered parity arrays					160
	5.4. Disk-oriented reconstruction algorithm to support distributed sparing 169
	5.5. Performance analysis											171
	5.5.1. Reconstruction performance									171
	5.5.2. Ultra-fast reconstruction in large arrays					172
	5.5.3. Large-access performance in reconfigured mode				175
	5.5.4. Re-evaluating the reconstruction options under distributed sparing 177
	5.6. A related study												179
	5.7. Conclusions													181
	
Chapter 6: Conclusions												183

References															193

Appendix A: Data Mapping Algorithms									201
	A.1. Preliminaries													201
	A.2. Mapping code for declustered parity							202
	A.2.1. Data structures												202
	A.2.2. MapSector													205
	A.2.3. MapParity													206
	A.2.4. MapPhysicalToStripeID										207
	A.3. Mapping code for declustered parity with distributed sparing	208
	A.3.1. MapSector													210
	A.3.2. MapParity													211
	A.3.3. MapPhysicalToStripeID										212
	A.3.4. remap_to_spare_space											214
	A.4. adjust_params													215

Appendix B: Block Designs											217
	B.1. Block designs on v > 43										217
	B.2. Designs in the database										217
	B.3. Block designs used in the simulations							218
	B.3.1. Designs on v = 40											220
	B.3.2. Designs on k = 4												221

Appendix C: Simulation and Model Data								225