Appears in Proceedings of the Second Symposium on Operating Systems Design and Implementation (OSDI '96), pages 3-17, October 1996. Supercedes Carnegie Mellon University SCS Technical Report CMU-CS-96-174.
Todd C. Mowry, Angela K. Demke and Orran Krieger
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Current operating systems offer poor performance when a numeric application's
working set does not fit in main memory. As a result, programmers who
wish to solve "out-of-core" problems efficiently are typically
faced with the onerous task of rewriting an application to use explicit
I/O operations (e.g., read/write). In this paper, we propose and evaluate
a fully-automatic technique which liberates the programmer from this
task, provides high performance, and requires only minimal changes to
current operating systems. In our scheme, the compiler provides the
crucial information on future access patterns without burdening the
programmer, the operating system supports non-binding prefetch and release
hints for managing I/O, and the operating system cooperates with a run-time
layer to accelerate performance by adapting to dynamic behavior and
minimizing prefetch overhead. This approach maintains the abstraction
of unlimited virtual memory for the programmer, gives the compiler the
flexibility to aggressively move prefetches back ahead of references,
and gives the operating system the flexibility to arbitrate between
the competing resource
demands of multiple applications. We have implemented our scheme using
the SUIF compiler and the Hurricane operating system. Our experimental
results demonstrate that our fully-automatic scheme effectively hides
the I/O latency in out-of-core versions of the entire NAS Parallel benchmark
suite, thus resulting in speedups of roughly twofold for five of the
eight applications, with one application speeding up by over threefold.
FULL PAPER: pdf / postscript
ORIGINAL TR VERSION OF THIS PAPER: pdf / postscript