PDL Abstract I/O Acceleration with Pattern Detection 22nd Int. ACM Symposium on High Performance Parallel and Distributed Computing (HPDC'13), New York City, June 17-21, 2013. Jun He^, John Bent‡, Aaron Torres*, Gary Grider*, Garth A. Gibson#, Carlos Maltzahn**, Xian-He Sun† Carnegie Mellon University Pittsburgh, PA 15213 ^University of Wisconsin, Madison ‡EMC *Los Alamos National Laboratory #Carnegie Mellon University and Panasas **University of California, Santa Cruz †Illinois Institute of Technology http://www.pdl.cmu.edu/ The I/O bottleneck in high-performance computing is becoming worse as application data continues to grow. In this work, we explore how patterns of I/O within these applications can significantly affect the effectiveness of the underlying storage systems and how these same patterns can be utilized to improve many aspects of the I/O stack and mitigate the I/O bottleneck. We offer three main contributions in this paper. First, we develop and evaluate algorithms by which I/O patterns can be efficiently discovered and described. Second, we implement one such algorithm to reduce the metadata quantity in a virtual parallel file system by up to several orders of magnitude, thereby increasing the performance of writes and reads by up to 40 and 480 percent respectively. Third, we build a prototype file system with pattern-aware prefetching and evaluate it to show a 46 percent reduction in I/O latency. Finally, we believe that efficient pattern discovery and description, coupled with the observed predictability of complex patterns within many high-performance applications, offers significant potential to enable many additional I/O optimizations. Keywords I/O; pattern; large-scale storage systems; high performance computing; PLFS; prefetching FULL PAPER: pdf Parallel Data Laboratory

PARALLEL DATA LAB 

PDL Abstract

I/O Acceleration with Pattern Detection

22nd Int. ACM Symposium on High Performance Parallel and Distributed Computing (HPDC'13), New York City, June 17-21, 2013.

Jun He^, John Bent‡, Aaron Torres*, Gary Grider*, Garth A. Gibson#, Carlos Maltzahn**, Xian-He Sun†

Carnegie Mellon University
Pittsburgh, PA 15213

^University of Wisconsin, Madison
‡EMC
*Los Alamos National Laboratory
#Carnegie Mellon University and Panasas
**University of California, Santa Cruz
†Illinois Institute of Technology

http://www.pdl.cmu.edu/

The I/O bottleneck in high-performance computing is becoming worse as application data continues to grow. In this work, we explore how patterns of I/O within these applications can significantly affect the effectiveness of the underlying storage systems and how these same patterns can be utilized to improve many aspects of the I/O stack and mitigate the I/O bottleneck. We offer three main contributions in this paper. First, we develop and evaluate algorithms by which I/O patterns can be efficiently discovered and described. Second, we implement one such algorithm to reduce the metadata quantity in a virtual parallel file system by up to several orders of magnitude, thereby increasing the performance of writes and reads by up to 40 and 480 percent respectively. Third, we build a prototype file system with pattern-aware prefetching and evaluate it to show a 46 percent reduction in I/O latency. Finally, we believe that efficient pattern discovery and description, coupled with the observed predictability of complex patterns within many high-performance applications, offers significant potential to enable many additional I/O optimizations. Keywords I/O; pattern; large-scale storage systems; high performance computing; PLFS; prefetching

FULL PAPER: pdf