47th International Symposium on Computer Architecture, May 30 – June 3, 2020, Virtual Valencia, Spain.
Rajat Kateja, Nathan Beckmann, Gregory R. Ganger
Carnegie Mellon University
Production storage systems complement device-level ECC (which covers media errors) with system-checksums and cross-device parity. This system-level redundancy enables systems to detect and recover from data corruption due to device firmware bugs (e.g., reading data from the wrong physical location). Direct access to NVM penalizes software-only implementations of system-level redundancy, forcing a choice between lack of data protection or significant performance penalties. We propose to offload the update and verification of systemlevel redundancy to TVARAK, a new hardware controller colocated with the last-level cache. TVARAK enables efficient protection of data from such bugs in memory controller and NVM DIMM firmware. Simulation-based evaluation with seven data-intensive applications shows that TVARAK is efficient. For example, TVARAK reduces Redis set-only performance by only 3%, compared to 50% reduction for a state-of-the-art softwareonly approach.
KEYWORDS: Non-volatile memory, Direct access, Redundancy
FULL TR: pdf