Carnegie Mellon University Ph.D Dissertation CMU-CS-01-172, December 2001.
Fay W. Chang
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
The gap between processing speeds and disk access times is widening. This trend is causing applications that must fetch data from disk to spend an increasing proportion of their execution times stalled on disk I/O. I/O prefetching, a well-known technique for hiding disk latency, has the potential to alleviate this problem, particularly when the data that needs to be fetched is distributed across multiple disks. A major hurdle to benefiting from this technique in practice is the difficulty of generating accurate and timely prefetches. In this dissertation, I put forth a new approach to generating accurate and timely prefetches without programmer involvement.
The key to the proposed approach is its unique method for predicting what data an executing process will access in the future. The approach involves adding an execution of each target process's code that exploits spare processing cycles. These added executions skip some operations, like accesses to uncached data, so that they can run ahead of their target normal executions. This permits differences between the data values used during the added speculative executions and their target normal executions. Despite any such differences, the approach predicts that the data accesses encountered during speculative executions will often be the same as the data accesses that will be encountered during their target normal executions such that, by initiating prefetching of that data, speculative executions could reduce the I/O stall time of their target normal executions.
To investigate the viability of this approach, I developed and evaluated SpecHint, a design and implementation for applying the approach automatically. SpecHint is based on binary modification and requires no operating system support specific to this approach. I evaluated SpecHint using six benchmarks from the TIP benchmark suite. Experiments demonstrate that, with the file system striped across four disks, SpecHint reduces the elapsed times of five of the benchmarks (text search, scientific visualization, object linking, and two database queries) by 24% to 71%, with an average of 53%. Moreover, simulation experiments suggest that, on future systems, the benefit of SpecHint would not decrease, and may even increase for some applications.
FULL THESIS: pdf / postscript