Harvard Computer Science Group Technical Report TR-14-03, December 2003.
Daniel Ellard* , Michael Mesnier, Eno Thereska , Gregory R. Ganger , Margo Seltzer*
* Harvard University Division of Engineering and Applied Sciences
                      Computer Science Group
                      Harvard University
                      Cambridge, MA
                    
Parallel Data Laboratory, Carnegie Mellon University.
   Intel Corporation and Parallel Data Laboratory, Carnegie Mellon 
                      University.
                      Carnegie Mellon University
                      Pittsburgh, PA 15213
                    
We present evidence that attributes that are known to the file system when a file is created, such as its name, permission mode, and owner, are often strongly related to future properties of the file such as its ultimate size, lifespan, and access pattern. More importantly, we show that we can exploit these relationships to automatically generate predictive models for these properties, and that these predictions are sufficiently accurate to enable optimizations.
FULL PAPER: pdf / postscript