PDL CONSORTIUM SPEAKER SERIES
A ONE-AFTERNOON SERIES OF SPECIAL SDI TALKS BY
PDL CONSORTIUM VISITORS
DATE: Tuesday, May 3, 2016
TIME: 12:00 pm to 5:00 pm
PLACE: ISTC Panther Hollow Room - RMCIC 4th Floor
SPEAKERS:
12:00 - 12:45 pm Siying Dong, Facebook 12:45 - 1:30 pm Yang Seok Ki, Samsung 1:30 - 1:45 pm break 1:45 - 2:30 pm Thomas Baby, Oracle 2:30 - 3:15 pm Shasank Chavan, Oracle 3:15 - 3:30 pm break 3:30 - 4:15 pm David Palaitis, Two Sigma 4:15 - 5:00 pm Pankaj Mehra, SanDisk All talks located in RMCIC Panther Hollow Conference Room, 4th Floor.
SPEAKER: Siying Dong, Facebook
RocksDB: Key-Value Store Optimized for Flash
RocksDB is an embedded persistent key-value store for low-latency and high-throughput workload. It has been adapted to a wide range of workloads, as an embedded DBMS and as storage engines of other DBMS systems.
Space amplification is a common optimization goal. RocksDB uses Log-Structured Merge-tree and we designed a compaction algorithm to achieved an optimized space amplification. We faced several technical challenges when we put RocksDB into production. We'll discuss how we deal with them. Finally, we'll present some open issues that may be interesting to researchers.
BIO: Siying Dong is a software engineer working in the Database Engineering team at Facebook, focusing on RocksDB. He also worked on Hive, HDFS, and some other data warehouse infrastructures. Before joining Facebook, Siying worked in the SQL Azure Team at Microsoft. He received a bachelor's degree from Tsinghua University and a master's degree from Brandeis University.
SPEAKER: Yang Seok Ki, Samsung
How Can Storage Device Innovation Change the Datacenter S/W?
Many software infrastructure used in today's data centers rely on the legacy software that were mainly designed for HDD. The abstraction for such device is a slow and unreliable storage with a block interface. However, the wide deployment of fast storage device like SSD challenges such assumptions with respect to performance, reliability, and interface. For example, data center managers frequently use a conversion layer such as memcached, LevelDB, or RocksDB to bridge the gap between the block interface and high-level key value abstraction in accessing underlying storage and memory systems. However, the relatively high software overhead of these conversion layers can causes performance and reliability degradation to the storage device, due to their high read, write and space amplification. In this talk, Dr. Ki introduces a variety of SSD innovations being conducted in Samsung for conventional devices and beyond.
BIO: Yang-Suk Kee (Yang Seok Ki) is a director and architect of Memory Solutions Lab, Samsung Semiconductor Inc. America. He leads the Advanced Datacenter Solutions group whose main focus is to innovate SSD and its ecosystem across datacenter hardware and software infrastructure, and drives a data-centric computing called Smart SSD. Before joining Samsung, he worked for Oracle server technology group that builds a distributed database server system, and contributed to Oracle 12c release. Prior to his industrial experience, he worked on HPDC (High Performance Distributed Computing), Grid, and Cloud research in Information Sciences Institute of University of Southern California and Center of Networked Systems, University of California, San Diego. He received his Ph.D. degree of Electrical Engineering and Computer Science in parallel processing, his M.S. degree of Computer Engineering, and B.S. degree of Computer Engineering from Seoul National University, Korea.
SPEAKER: Thomas Baby, Oracle
Oracle Database Architecture for the Cloud
The IT industry today is undergoing a revolutionary change in how customers deploy and configure their compute resources. Driven by the demand to reduce costs, both in capital and operation expense, these customers are turning to CLOUD or HYBRID-CLOUD solutions. These customers span the spectrum from very small startup businesses to Fortune 500 companies across regions and industries. Oracle Corporation leverages innovative engineering to respond to that demand as both provider and consumer of cloud technology by providing highly secure, highly scalable cloud solutions. In this talk, we will describe innovations in database architecture that ease cloud provisioning of database resources at scale, achieve consolidation density while preserving compute isolation, and a management interface for course grained ‘manage many as one’ database operations.
BIO: Thomas Baby is a Consulting Member of Technical Staff at Oracle. He graduated from the University of Maryland, College Park with a Masters Degree in Computer Science and joined Oracle in 2004. He is now part of the Oracle Database Multitenant Development team. He is the Technical Lead for various Oracle Database Multitenant features that form the basis for Oracle Database's Cloud solution.
SPEAKER: Shasank Chavan, Oracle
Oracle Database In-Memory: The Next Generation
The Database In-Memory (DBIM) Option by Oracle is an industry-first dual format in-memory database that maintains transactional consistent data in both row and columnar formats. This unique architecture enables analytic and OLTP workloads to coexist simultaneously, bringing together the best of both worlds. DBIM is the fastest growing database option since its release in 2014, achieving great success with customer adoption. The next release of DBIM is slated for the summer of 2016, and contains significant innovations in the areas of join processing, aggregations, scan performance, and availability. Algorithms across the database stack have been redesigned to directly process encoded and compressed columnar data at memory bandwidth speeds using SIMD vector processing. Furthermore, existing Oracle functionality continues to be integrated seamlessly with DBIM, such as Automatic Data Optimization (ADO), Oracle's solution for storage tiering based on heat map statistics, and Active Data Guard (ADG), Oracle's solution for recovery, availability, and read-only data offload processing. The combined efforts of these latest features improve the SSB benchmark by 32X and the JSON no-bench benchmark by 8X. SLIDES
BIO: Shasank Chavan is an architect and director of the In-memory Data Technologies group at Oracle Corporation. He is primarily responsible for driving and delivering core performance-critical and customer-facing data layer features in Oracle's Database In-Memory option. His team designs and develops CPU-specific "software-in-silicon" libraries for columnar data evaluation, optimized data formats and compression technology for efficient in-memory storage, algorithms and techniques for fast in-memory join and aggregation processing, multi-threaded scan execution engine with push-down technology, and optimized in-memory data access solutions in general. Shasank has over 17 years of experience working on systems software technology.
SPEAKER: David Palaitis, Two Sigma
Machine Learning at Two Sigma Scale
This talk will present some of the hard problems Two Sigma has faced in performing Time Series Analysis and Machine Learning at large scale across heterogenous clusters of commodity hardware. We will present a set of canonical learning techniques and dive into the design and architecture of our solution.
BIO:David Palaitis is a technology manager and architect at Two Sigma. He’s been in the finance industry for over 15 years where he’s focused almost exclusively on building systems for information retrieval, time series analysis and machine learning at scale. Before that, he was in the punk music industry, a less academically interesting pursuit but equally as valuable in forming a career.
SPEAKER: Pankaj Mehra, SanDisk
From Our Crystal Ball, the 5-year Technology Outlook
We at SanDisk recently completed our first assessment of technologies beyond those reflected in the company's product roadmaps. The assessment, led by our CTO Office, is intended to spur thinking about new technologies post-2020 and usage scenarios of future products.
In preparing this outlook, we examined a variety of factors, including current trends, market forces, competitors, data from internal teams related to memory, and conversations with experts inside and outside the company. We also looked at what's possible given the laws of physics.
The CTO assessment covered four key areas:
- Memory technology
- Enabling technologies, including controller chips, their packaging and interfaces
- Mobile and Client products
- Enterprise/Data Center
In this talk, I plan to focus on the enterprise portion of the outlook, summarizing the rest, and will be looking to reopen our dialog with PDL faculty and students in two key areas:
- Persistent memory in the memory hierarchy and its impact
- Data centric architecture
BIO: Pankaj has over 20 years of technical experience in developing and architecting scalable, intelligent information systems and services. At SanDisk, he is a Senior Fellow working closely with our customers to build accelerated solutions for data centers and applications, and is continuing to shape and evangelize SanDisk technologies. Before joining SanDisk, Pankaj had worked as Chief Technology Officer at Fusion-io and was named as a top 50 CTO by ExecRank. He has also worked at Hewlett Packard, Compaq, IBM, and as a professor teaching courses in Supercomputing at IIT Delhi. Pankaj received his Ph.D. in Computer Science from the University of Illinois.
SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/