PARALLEL DATA LAB 

PDL Abstract

SmartScan: Efficient Metadata Crawl for Storage Management Metadata Querying in Large File Systems

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-10-112. Oct. 2010.

Likun Liu*, Lianghong Xu†, Yongwei Wu*, Guangwen Yang*, Gregory R. Ganger†

Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

*Tsinghua University
†Carnegie Mellon University

http://www.pdl.cmu.edu/

SmartScan is a metadata crawl tool that exploits patterns in metadata changes to significantly improve the efficiency of support for file-system-wide metadata querying, which is an important tool for administrators. In most environments, support for metadata queries is provided by databases populated and refreshed by calling stat() on every file in the file system. For large file systems, where such storage management tools are most needed, it can take many hours to complete each scan, even if only a small per- centage of the files have changed. To address this issue, we identify patterns in metadata changes that can be exploited to restrict scanning to the small subsets of directories that have recently had modified files or that have high variation in file change times. Experiments with using SmartScan on production file systems show that exploiting metadata change patterns can reduce the time needed to refresh the metadata database by one or two orders with minimal loss of freshness.

KEYWORDS: Metadata Crawl, Storage Management, File Systems

FULL TR: pdf