Appears in Proceedings of the Second Symposium on Operating Systems Design and Implementation (OSDI '96), pages 3-17, October 1996. Supercedes Carnegie Mellon University SCS Technical Report CMU-CS-96-174.
Todd C. Mowry, Angela K. Demke and Orran Krieger
School of Computer Science 
                      Carnegie Mellon University 
                      Pittsburgh, PA 15213 
                    
Current operating systems offer poor performance when a numeric application's 
                      working set does not fit in main memory. As a result, programmers who 
                      wish to solve "out-of-core" problems efficiently are typically 
                      faced with the onerous task of rewriting an application to use explicit 
                      I/O operations (e.g., read/write). In this paper, we propose and evaluate 
                      a fully-automatic technique which liberates the programmer from this 
                      task, provides high performance, and requires only minimal changes to 
                      current operating systems. In our scheme, the compiler provides the 
                      crucial information on future access patterns without burdening the 
                      programmer, the operating system supports non-binding prefetch and release 
                      hints for managing I/O, and the operating system cooperates with a run-time 
                      layer to accelerate performance by adapting to dynamic behavior and 
                      minimizing prefetch overhead. This approach maintains the abstraction 
                      of unlimited virtual memory for the programmer, gives the compiler the 
                      flexibility to aggressively move prefetches back ahead of references, 
                      and gives the operating system the flexibility to arbitrate between 
                      the competing resource
                      demands of multiple applications. We have implemented our scheme using 
                      the SUIF compiler and the Hurricane operating system. Our experimental 
                      results demonstrate that our fully-automatic scheme effectively hides 
                      the I/O latency in out-of-core versions of the entire NAS Parallel benchmark 
                      suite, thus resulting in speedups of roughly twofold for five of the 
                      eight applications, with one application speeding up by over threefold. 
 FULL PAPER: pdf / postscript 
                      ORIGINAL TR VERSION OF THIS PAPER: pdf / postscript