PARALLEL DATA LAB 

PDL Abstract

SkyeFS: Distributed Directories using Giga+ and PVFS

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-12-104, May 2012.

Anthony Chivetta, Swapnil Patil & Garth A. Gibson

School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

There is growing set of large-scale data-intensive applications that require le system directories to store millions to billions of files in each directory and to sustain hundreds of thousands of concurrent directory operations per second. Unfortunately, most cluster file systems are unable to provide this level of scale and parallelism. In this research, we show how the GIGA+ distributed directory algorithm, developed at CMU, can be applied to a real-world cluster file system. We designed and implemented a user-level file system, called SkyeFS, that efficiently layers GIGA+ on top of the PVFS cluster file system. Our experimental evaluation demonstrates how an optimized interposition layer can help PVFS achieve the desired scalability for massive file system directories.

FULL TR: pdf