PARALLEL DATA LAB 

PDL Abstract

Informed Prefetching and Caching

Carnegie Mellon University Ph.D Dissertation CMU-CS-97-204: December 1997.

Russel Hugo Patterson III

Dept. of Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Disk arrays provide the raw storage throughput needed to balance rapidly increasing processor performance. Unfortunately, many important, I/O-intensive applications have serial I/O workloads that do not benefit from array parallelism. The performance of a single disk remains a bottleneck on overall performance for these applications. In this dissertation, I present aggressive, proactive mechanisms that tailor file-system resource management to the needs of I/O-intensive applications. In particular, I will show how to use application-disclosed access patterns (hints) to expose and exploit I/O parallelism, and to dynamically allocate file buffers among three competing demands: prefetching hinted blocks, caching hinted blocks for reuse, and caching recently used data for unhinted accesses. My approach estimates the impact of alternative buffer allocations on application elapsed time and applies run-time cost-benefit analysis to allocate buffers where they will have the greatest impact. I implemented TIP, an informed prefetching and caching manager, in the Digital UNIX operating system and measured its performance on a 175 MHz Digital Alpha workstation equipped with up to 10 disks running a range of applications. Informed prefetching on a ten-disk array reduces the wall-clock elapsed time of computational physics, text search, scientific visualization, relational database queries, speech recognition, and object linking by 10-84% with an average of 63%. On a single disk, where storage parallelism is unavailable and avoiding disk accesses is most beneficial, informed caching reduces the elapsed time of these same applications by up to 36% with an average of 13% compared to informed prefetching alone. Moreover, applied to multiprogrammed, I/O-intensive workloads, TIP increases overall throughput.

FULL THESIS: pdf / postscript