TIME: 12:00 noon - to approximately 1:00 pm EDT
PLACE: Virtual - a zoom link will be emailed closer to the seminar
SPEAKERS: Pat Helland and Daniel May, Salesforce
Yours, Mine, and Ours: Efficient Set Reconciliation in O(n log n) of the SET DIFFERENCE
We will discuss and explain a new algorithm that empowers efficient reconciliation of sets. Extremely large sets can be reconciled in O(n log n) of the SET DIFFERENCE, not the underlying size of the sets. The algorithm is a variant of erasure codes (familiar to the database community) and fountain codes (familiar to the data communications community). This opens new avenues for solutions based on repairing sets that do not even yet exist! When distributed systems agree in advance what items belong in a set, different participants can add items to the set effectively performing replica repair over future content.
We will explain the set reconciliation algorithm presented at SIGCOMM 2024 in August within a paper titled "Practical Rateless Set Reconciliation" by Yang et al and how it can accomplish such efficiency. Many disparate research opportunities are opened by this algorithm including replica repair (faster than Merkle Trees), improved gossip protocols, scientific computations including detecting small differences in large genomes, management of cloud based control planes, and possibly even improvements to multi-phase protocols used for distributed systems. We hope to conclude with the audience brainstorming about even more possible applications.
BIOS: Pat Helland has been building distributed systems and databases since 1978 at companies including Tandem, Microsoft, and Amazon. He is constantly curious about emerging trends in technology and their implications on systems. Pat has been working on database technology at Salesforce since 2012.
Daniel May works at Salesforce and has been analyzing and improving the performance of complex systems for more than 7 years. He spends much of his spare time voraciously consuming technical papers, largely about distributed systems.
Director, Parallel Data Lab
VOICE: (412) 268-1297
Executive Director, Parallel Data Lab
VOICE: (412) 268-5485
PDL Administrative Manager
VOICE: (412) 268-6716