TIME: 12:00 noon - to approximately 1:00 pm EDT
PLACE: Virtual - a zoom link will be emailed closer to the seminar
SPEAKER: Vaibhav Arora
Principal Member of Technical Staff, Salesforce
LSM Management and using LSM immutability for data virtualization
LSM (Log-Structured Merge) trees are now the bedrock of many storage engines and datastores like RocksDB, HBase, Cassandra etc. They provide the ability to avoid random-writes, and provide immutability. Data is organized in multiple-levels that are exponentially increasing in size. Each data mutation writes a new version of an object, and background processes named merge/compaction continuously remove the unused versions, while moving the data across the layers of the LSM tree and maintain its shape.
This talk will describe how the immutability of LSMs can be used to provide data virtualization. Since the underlying data in persistence in an LSM never changes, it provides an opportunity to build a virtualization layer over it. In this talk, we will describe a mechanism to use metadata in the form of many to 1 references over data files in the LSM. This metadata can then be used to create constant-time clones of data-sets without physically copying the data. These clones can then be used for testing, and experimentation and also for taking back-ups. The cloning can also be used for fast data-migration between multiple datastores with LSM based-storage, running on a common distributed storage layer.
BIO: Vaibhav works in Salesforce where he works on various large-scale data management problems including LSM, multi-tenancy, transaction processing, indexing and other access layer related challenges in the database engine. Prior to joining Salesforce, he did his PhD in Computer Science from University of California, Santa Barbara, where he worked on problems related to data variety - both in structure and access. Aside from that, he has worked on various distributed database problems during his time at Amazon, Microsoft Research, HP Labs and Yahoo!.
Director, Parallel Data Lab
VOICE: (412) 268-1297
Executive Director, Parallel Data Lab
VOICE: (412) 268-5485
PDL Administrative Manager
VOICE: (412) 268-6716