PARALLEL DATA LAB 

PDL Abstract

Shingled Magnetic Recording for Big Data Applications

Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-12-105. May 2012.

Anand Suresh, Garth A. Gibson, Gregory R. Ganger

School of Computer Science
Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213

http://www.pdl.cmu.edu/

Modern Hard Disk Drives (HDDs) are fast approaching the superparamagnetic limit forcing the storage industry to look for innovative ways to transition from traditional magnetic recording to Heat-Assisted Magnetic Recording or Bit-Patterned Magnetic Recording. Shingled Magnetic Recording (SMR) is a step in this direction as it delivers high storage capacity with minimal changes to current production infrastructure. However, since it sacrifices random-write capabilities of the device, SMR cannot be used as a drop-in replacement for traditional HDDs.

We identify two techniques to implement SMR. The first involves the insertion of a shim layer between the SMR device and the host, similar to the Flash Translation Layer found in Solid-State Drives (SSDs). The second technique, which we feel is the right direction for SMR, is to push enough intelligence up into the file system to effectively mask the sequential-write nature of the underlying SMR device. We present a custom-built SMR Device Emulator and ShingledFS, a FUSE-based SMR-aware file system that operates in tandem with the SMR Device Emulator. Our evaluation studies SMR for Big Data applications and we also examine the overheads introduced by the emulation. We show that Big Data workloads can be run effectively on SMR devices with an overhead as low as 2.2% after eliminating the overheads of emulation. Finally we present insights on garbage collection mechanisms and policies that will aid future SMR research.

KEYWORDS: Shingled Magnetic Recording, SMR, Big Data, Hadoop

FULL TR: pdf