PARALLEL DATA LAB

Landslide: Systematic Dynamic Race Detection in Kernel Space

Race conditions are notoriously difficult to debug. Because of their nondeterministic nature, they frequently do not manifest at all during testing, and when they do manifest, it is difficult to reproduce them reliably enough to collect enough information to help debugging.

In kernel space, race condition debugging becomes even more difficult. Many aspects of the concurrency implementation itself are part of the system being tested. While user-space testing frameworks rely on the underlying kernel to provide common concurrency abstractions which the framework can use to drive the test, systematic exploration in the kernel itself necessarily exists underneath such abstractions.

Landslide is a tool for performing systematic exploration in kernel space with a focus on Pebbles kernels written for 15-410. In the past, we have met with students of 15-410 during the final weeks of the kernel project and guided them in using Landslide on their own kernels, resulting in several concurrency bugs of varying severity and subtlety being found and fixed.

With Landslide, we see testing a kernel as a process of manipulating test parameters in two ways: first, in the choice of test case, and second, in the user's configuration of Landslide to express which parts of the kernel are "interesting" and which are irrelevant.

In future work, we are considering modifying Landslide's architecture to be applicable on production kernels such as Linux, with potential focuses on analysing device drivers and on replacing simulation with virtualisation for performance. We are also considering focusing on Landslide's applicability as an educational tool.

Anatomy of Landslide

When Landslide finds a bug, it outputs a "decision trace" - a detailed report of the thread
interleaving meant to help the programmer understand and debug it.

When exploring the state space, Landslide uses the Dynamic Partial Order Reduction
algorithm to identify redundant subtrees and avoid exploring them.


People

FACULTY

Garth Gibson

GRAD STUDENTS

Ben Blum

Publications

  • Landslide: Systematic Dynamic Race Detection in Kernel Space. Ben Blum. Carnegie Mellon University School of Computer Science MS Thesis CMU-CS-12-118. May 2012.
    Abstract / PDF [1.7M]


CODE

Acknowledgements

This research was sponsored by the U.S. Army Research Office under grant number W911NF0910273.

We thank the members and companies of the PDL Consortium: Amazon, Bloomberg, Datadog, Google, Honda, Intel Corporation, IBM, Jane Street, Meta, Microsoft Research, Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Two Sigma, and Western Digital for their interest, insights, feedback, and support.