Los Alamos National Laboratory Associate Directorate for Theory, Simulation, and Computation (ADTSC) LA-UR 13-20839, 2013.
Gary Grider, HPC-DO;
John Bent, EMC Corporation;
Chuck Cranor, Carnegie
Mellon University;
Jun He, New Mexico
Consortium;
Aaron Torres, HPC-3;
Meghan McClelland, HPC-5;
Brett Kettering, HPC-5
To improve the checkpoint bandwidth of critical applications at LANL, we developed the Parallel Log Structured File System (PLFS). PLFS is a transformative I/O middleware layer placed within our storage stack. It transforms a concurrently written single shared file into non-shared component pieces. This reorganized I/O has made write size a non-issue and improved checkpoint performance by orders of magnitude, meeting the project's L2 milestone to show increased performance for checkpointing with LANL codes. LANL is working together with EMC under an umbrella Cooperative Research and Development Agreement (CRADA) to further enhance, design, build, test, and deploy PLFS. PLFS has been integrated with multiple types of storage systems, including cloud storage, and has shown improvements in file storage sizes and metadata rates.
FULL PAPER: pdf