NOTE SPECIAL DAY - TUESDAY
Tuesday, April 25, 2017
SPEAKER: Haryadi Gunawi, University of Chicago
TITLE: What (New) Bugs Live in the Cloud?
I will describe three new classes of bugs that often appear in large-scale datacenter distributed systems: (1) distributed concurrency bugs, caused by non-deterministic timings of distributed events such as message arrivals as well as multiple crashes and reboots; (2) limpware-induced performance bugs, design bugs that surface in the presence of "limping" hardware and cause cascades of performance failures; and (3) scalability bugs, latent bugs that are scale dependent, typically only surface in large-scale deployments (100+ nodes) but not necessarily in small/medium-scale deployments.
The findings above are based on our long, large-scale cloud bug study (3000+ bugs) and cloud outage study (500+ outages). I will present some of our work in understanding and combating the three classes of bugs above, including semantic-aware model checking (SAMC), taxonomy of distributed concurrency bugs (TaxDC), impacts of limpware ("Limplock"), path-based speculative execution (PBSE), and scalability checks (SCk).
His research focus is in improving dependability of storage and cloud computing systems in the context of (1) performance stability, wherein he is interested in building storage and distributed systems that are robust to "limping" hardware, (2) reliability, wherein he is interested in combating non-deterministic concurrency bugs in cloud-scale distributed systems, and (3) scalability, wherein he is interested in developing approaches to find latent scalability bugs that only appear in large-scale deployments.
SEMINAR HOST: Garth Gibson