Appears in Proceedings of the First International Conference on Autonomic Computing (ICAC-04). New York, NY. May 2004. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-04-101, January 2004.
Michael Mesnier, Eno Thereska, Daniel Ellard*, Gregory R. Ganger, Margo Seltzer*
Parallel Data Laboratory, Carnegie Mellon University.
   Intel Corporation and Parallel Data Laboratory, Carnegie Mellon 
                      University.
                      Carnegie Mellon University
                      Pittsburgh, PA 15213
* Harvard University Division of Engineering and Applied Sciences.
                      Computer Science Group
                      Harvard University
                      Cambridge, MA
                    
To tune and manage themselves, file and storage systems must understand key properties (e.g., access pattern, lifetime, size) of their various files. This paper describes how systems can automatically learn to classify the properties of files (e.g., read-only access pattern, short-lived, small in size) and predict the properties of new files, as they are created, by exploiting the strong associations between a files properties and the names and attributes assigned to it. These associations exist, strongly but differently, in each of four real NFS environments studied. Decision tree classifiers can automatically identify and model such associations, providing prediction accuracies that often exceed 90%. Such predictions can be used to select storage policies (e.g., disk allocation schemes and replication factors) for individual files. Further, changes in associations can expose information about applications, helping autonomic system components distinguish growth from fundamental change.
KEYWORDS: autonomic, self-managing, decision trees, storage, machine learning, attributes
FULL PAPER: pdf / postscript 
                      ORIGINAL TR VERSION OF THIS PAPER:  pdf / postscript