PARALLEL DATA LAB 

PDL Abstract

Improving Small File Performance in Object-based Storage

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-06-104, May 2006.

James Hendricks, Raja R. Sambasivan, Shafeeq Sinnamohideen, Gregory R. Ganger

Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

This paper proposes architectural refinements, server-driven metadata prefetching and namespace flattening, for improving the efficiency of small file workloads in object-based storage systems. Server driven metadata prefetching consists of having the metadata server provide information and capabilities for multiple objects, rather than just one, in response to each lookup. Doing so allows clients to access the contents of many small files for each metadata server interaction, reducing access latency and metadata server load. Namespace flattening encodes the directory hierarchy into object IDs such that namespace locality translates to object ID similarity. Doing so exposes namespace relationships among objects (e.g., as hints to storage devices), improves locality in metadata indices, and enables use of ranges for exploiting them. Trace-driven simulations and experiments with a prototype implementation show significant performance benefits for small file workloads.

KEYWORDS: Object-based storage, object ID assignment algorithms, multi-object capabilities, namespace flattening, OSD, range operations, server-driven metadata prefetching

FULL TR: pdf