Carnegie Mellon University ECE Ph.D. Dissertation, CMU-PDL-09-113, August 17, 2009.
Brandon Watts Salmon
Dept. Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
Distributed storage is coming home. An increasing number of home and personal electronic devices create, use, and display digitized forms of music, images, videos, as well as more conventional files (e.g., nancial records and contact lists). In-home networks enable these devices to communicate, and a variety of device-specic and datatype-specific tools are emerging. The transition to digital homes gives exciting new capabilities to users, but it also makes them responsible for administration tasks usually handled by dedicated professionals in other settings. It is unclear that traditional data management practices will work for “normal people” reluctant to put time into administration.
This dissertation presents a number of studies of the way home users deal with their storage. One intriguing finding of these studies is that home users rarely organize and access their data via traditional folder-based naming. Usually, they do so based on data attributes. Computing researchers have long talked about attribute-based data navigation, while continuing to use folder-based approaches. However, users of home and personal storage live it. Popular interfaces (e.g., iTunes, iPhoto, and even drop-down lists of recently-opened Word documents) allow users to navigate file collections via attributes like publisher-provided metadata, extracted keywords, and date/time. In contrast, the abstractions provided by filesystems and associated tools for managing files have remained tightly tied to namespaces built on folders.
To correct the disconnect between semantic data access and folder-based replica management, this dissertation presents a new primitive that I call a “view”, as a replacement for the traditional volume abstraction. A view is a compact description of a set of files, expressed much like a search query, and a device on which that data should be stored. For example, one view might be “all files with type=music and artist=Beatles stored on Liz’s iPod” and another “all files with owner=Liz stored on Liz’s laptop”. Each device participating in a view-based filesystem maintains and publishes one or more views to describe the files that it stores. A view-based filesystem ensures that any file that matches a view will eventually be stored on the device named in the view. Since views describe sets of files using the same attribute-based style as users’ other tools, view-based management replica management should be easier than folder-based file management.
In this dissertation I present the design of Perspective, a view-based filesystem, and Insight, a set of view-based management tools. User studies, deployments and benchmarks using these prototypes show that view-based management simplifies some important tasks for non-technical users and can be supported efficiently by a distributed filesystem.
FULL TR: pdf