PARALLEL DATA LAB 

PDL Abstract

Opportunistic Use of Content Addressable Storage for Distributed File Systems

Proceedinge USENIX Annual Technical Conference, General Track 2003: 127-140, San Antonio, TX.

Niraj Tolia†*, Michael Kozuch*, Mahadev Satyanarayanan†*, Brad Karp*, Thomas Bressoud*^, Adrian Perrig†

*Intel Research Pittsburgh
†Carnegie Mellon University
^Denison University

School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Motivated by the prospect of readily available Content Addressable Storage (CAS), we introduce the concept of file recipes. A file's recipe is a first-class file system object listing content hashes that describe the data blocks composing the file. File recipes provide applications with instructions for reconstructing the original file from available CAS data blocks. We describe one such application of recipes, the CASPER distributed file system. A CASPER client opportunistically fetches blocks from nearby CAS providers to improve its performance when the connection to a file server traverses a low-bandwidth path. We use measurements of our prototype to evaluate its performance under varying network conditions. Our results demonstrate significant improvements in execution times of applications that use a network file system. We conclude by describing fuzzy block matching, a promising technique for using approximately matching blocks on CAS providers to reconstitute the exact desired contents of a file at a client.

FULL PAPER: pdf / postscript