PARALLEL DATA LAB 

PDL Talk Series

June 8, 2023


TIME
: 12:00 noon - to approximately 2:00 pm EDT
PLACE: Virtual - a zoom link will be emailed closer to the seminar


SPEAKER: Saurabh Kadekodi
Research Scientist, Google

Improving Data Reliability in Exascale Storage Clusters
Fundamental physical limitations have slowed down hardware scaling, thus ending the “free” scaling benefits of processing power and storage capacity. At the same time, data is growing at an unprecedented rate. This data juggernaut is highly disruptive. It morphs benign assumptions into critical bottlenecks, and forces radical system (re-)designs. My work replaces design decisions of distributed systems that are disrupted by scale with new, data-driven solutions that are efficient, scalable, nimble, and robust. As an example, I will describe disk-adaptive redundancy (DARE): a novel redesign of data reliability in exascale storage clusters driven by insights gleaned from studying over 5.3 million disks from production environments of Google, NetApp and Backblaze. I will also describe three new DARE systems that reduce conservative over-protection of data by up to 20% amounting to millions of dollars of cost savings along with a significant carbon footprint reduction, while always meeting desired data reliability targets. Additionally, I will briefly describe some past and current research efforts to improve the availability and performance of local and distributed storage systems including new erasure codes that reduce observed unavailability events at Google by up to 33%.

BIO: Saurabh Kadekodi obtained his PhD in the Computer Science Department at Carnegie Mellon University (CMU) in 2020 as part of the Parallel Data Laboratory (PDL) under the guidance of Prof. Gregory Ganger and Prof. Rashmi Vinayak. After graduation Saurabh joined Google as a Visiting Faculty Researcher, and is currently a Research Scientist in the Storage Analytics team. Saurabh is broadly interested in designing distributed systems with special focus on the performance and reliability of storage systems.


SPEAKER: Andy Pavlo
Associate Professor, Computer Science Department, CMU


Why Machine Learning for Automatically Optimizing Databases Doesn't Work
Database management systems (DBMSs) are complex software that requires sophisticated tuning to work efficiently for a given workload and operating environment. Such tuning requires considerable effort from experienced administrators, which is not scalable for large DBMS fleets. This problem has led to research on using machine learning (ML) to devise strategies to optimize DBMS configurations for any application, including automatic physical database design, knob configuration, and query tuning. Despite the many academic papers that tout the benefits of using ML to optimize databases, there have been only a few major success stories in industry in the last decade. In this talk, I discuss the challenges of using ML-enhanced tuning methods to optimize databases. I will address specific assumptions that researchers make about production database environments that are incorrect and identify why ML is not always the best solution to solving real-world database problems. As part of this, I will discuss state-of-the-art academic research and real-world tuning implementations.

BIO: Andy Pavlo is an Associate Professor with Indefinite Tenure in the Computer Science Department at Carnegie Mellon University. He is also the co-founder of the OtterTune automated database optimization start-up (https://ottertune.com).



CONTACTS


Director, Parallel Data Lab
VOICE: (412) 268-1297


Executive Director, Parallel Data Lab
VOICE: (412) 268-5485


PDL Administrative Manager
VOICE: (412) 268-6716