PARALLEL DATA LAB 

PDL Abstract

Reusing Migration to Simply and Efficiently Implement Multi-server Operations in Transparently Scalable Storage Systems

Carnegie Mellon University School of Computer Science Ph.D. Dissertation CMU-CS-10-141. May 2010.

Shafeeq Sinnamohideen

School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu

Distributed file systems that scale by partitioning files and directories among a collection of servers inevitably encounter multi-server operations. A common example is a RENAME that moves a file from a directory managed by one server to a directory managed by another. Transparently scalable systems (those that provide the same semantics for multi-server operations as they do for single-server operations) traditionally implement dedicated protocols for these rare operations. This thesis explores an alternate approach, with simplicity as a goal, that exploits the existence of migration functionality normally used for load balancing. When a client request would involve files on multiple servers, the system can migrate responsibility for those files onto one server and have it service the request. Although migration may be more expensive than a dedicated cross-server protocol, trace analysis of deployed file systems indicates that such operations are extremely rare in file system workloads. A prototype system that uses this approach to supporting multi-server operations scales linearly and performs well even when multi-server operations are 100X more common than the worst-case trace. Thus, when migration functionality exists in the system, multi-server operations can be efficiently handled with very little additional implementation complexity.

KEYWORDS: Object-based storage, object ID assignment algorithms, namespace flattening, OSD, metadata scalability, multi-server operations, cross-server operations, cross-directory operations, transparent scalabilily

FULL THESIS: pdf