PDL CONSORTIUM SPEAKER SERIES

A ONE-AFTERNOON SERIES OF SPECIAL SDI TALKS BY
PDL CONSORTIUM VISITORS

DATE: Monday, May 11, 2015
TIME: 1:00 pm to 5:00 pm
PLACE: ISTC Panther Hollow Room - RMCIC 4th Floor


SPEAKERS:
1:00 - 1:45 pm Tirthankar Lahiri, Oracle
1:45 - 2:30 pm Sorin Faibish, EMC
2:30 - 2:45 pm break
2:45 - 3:30 pm Roger MacNicol, Oracle
3:30 - 4:15 pm Tanj Bennett, Microsoft Research
4:15 - 5:00 pm Hideaki Kimura, HP Labs

SPEAKER: Tirthankar Lahiri, Oracle
Oracle Database In-Memory: A Dual Format In-Memory Database
The Oracle Database In-Memory Option allows Oracle to function as the industry-first dual-format in-memory database. Row formats are ideal for OLTP workloads which typically use indexes to limit their data access to a small set of rows, while column formats are better suited for Analytic operations which typically examine a small number of columns from a large number of rows. Since no single data format is ideal for all types of workloads, our approach was to allow data to be simultaneously maintained in both formats with strict transactional consistency between them. The new columnar format is a pure in-memory format with no impact to the on-disk representation. Tables required for fast analytics can be populated into the In-Memory column store. In-Memory columnar formats allow a variety of optimizations including various levels of compression, SIMD vector processing, and in-memory storage indexes. The Oracle in-memory column format thus results in per CPU core scan speeds exceeding many billions of rows per second. Furthermore, the greatly accelerated scan speed enables a variety of query optimizations such as In-Memory aggregation (Vector Group By), a real-time computation of cube aggregations that converts costly joins and aggregations into a series of filtered scans. The in-memory column store can be scaled out on a RAC cluster with additional high availability via in-memory duplication. OLTP updates and highly selective lookups on the same tables can be performed via the existing in-memory row store, i.e. the buffer cache. This allows the DBA to drop indexes required purely for analytics, and use the column store instead. Dropping analytic indexes may provide a substantial speedup for OLTP DML operations by eliminating costly index maintenance. The in-memory column store is seamlessly built in to the Oracle Database engine, therefore ensuring that all of the rich functionality and High Availability mechanisms of the Oracle Database work transparently with Database In-Memory.

BIO: Tirthankar Lahiri is Vice President of Development at Oracle, and is responsible for the Data Technologies area for the Oracle Database (this area coves Data, Space, and Transaction management) as well as the Oracle TimesTen In-Memory Database. Tirthankar has 18 years of experience in the Database industry. He has worked extensively on a variety of core Database Systems areas, for which he holds multiple patents: Manageability, Performance, Scalability, High Availability, Caching, Distributed Concurrency Control, In-Memory Data Management, etc. Tirthankar has a B.Tech in Computer Science from IIT, Kharagpur, and an MS in Electrical Engineering from Stanford University. He was in the PhD program at Stanford and his research areas included Multiprocessor Operating Systems and Semi-Structured Data.


SPEAKER: Sorin Faibish, EMC
Redefine Storage: Two-Tiers Architecture" New Storage Architecture with Auto-tiering Between a Fast Tier on flash and a Capacity Tier on cloudEMC's Fast Data Group has developed powerful technologies to bridge multi-tiered storage systems for the Fast Forward project together with DoE. Using that technology, we are now creating "Two-Tier Storage Architecture", for flash and cloud, into a single converged infrastructure. By combining the performance and agility of flash memory with the capacity and reliability of cloud storage, the Two-Tier Storage Architecture allows flexible independent provisioning of performance and capacity. We will share the new SDS implementation of the two tiers architecture capable of running in any layer in a HPC cluster; compute nodes, IO Nodes as well as storage nodes.

BIO: Sorin Faibish - DE, Fast Data Groups, Office of the CTO Built innovative shared storage solutions as performance architect of EMC NAS products. His AD work included architecture design of NFS clusters, grid computing support, architect the performance strategy of NAS and Celerra file system. Performance feature lead for NAS product developing the multi-threaded implementation of NAS server. Sorin is the head of OCTO patent committee. Sorin is also technology consultant and evangelist for pNFS as well as member of IETF and contributor to the pNFS protocol and evangelist of pNFS in research forums. Sorin's wider expertise include: Clustered File systems, Storage systems, High Performance Computing, Robotic architectures, Complex systems design and Artificial Intelligence in storage systems. Sorin holds a Master degree from Technion, Israel in EE, and is a member of IEEE, ACM, USENIX and SMPTE and has over 50 papers and 60 patents.

SLIDES: pdf, ppsx


SPEAKER: Roger MacNicol, Oracle
Query Franchising: a High Performance Solution for Heterogeneous Data Environments
With the end of the civil war between Hadoop and traditional database, customers have data in both: using the most appropriate tool for whichever kind of data it is. The natural result of this is a need for a unified query infrastructure to provide a simple interface to request reports that may draw on data in, for example, Oracle, MongoDB, and Cloudera, and return those results in a timely manner. We proposed and implemented an architecture based on Oracle’s SmartScan technology that allows unified metadata and computing close to the disk which we have termed “Query Franchising”– dispatch of query processing to self-similar compute agents on disparate systems without loss of operational fidelity.

BIO: Eight years at University of Oxford followed by five years helping found a computer startup at Oxford in 4th Generation Languages. Moved to the US in 1991 to work on SQL extensions for multi-dimensional databases. Ten years at Sybase as Manager and Architect of the IQ Columnar Database engine. Two years at Oracle working replication for HA/DR. Currently eight years in the Data Storage Technology team working on Hybrid Columnar and In-Memory Columnar technology, part of the Big Data SQL team, and Technical Lead for Oracle's SmartScan technology.


SPEAKER: Tanj Bennett, Microsoft Research
Anatomy of Bing Cloud Apps
Cloud applications are not like desktop applications, and they are not like web farms or enterprise VMs either. This talk will discuss what kind of resources are today available to cloud services and how that is changing program architecture.

BIO: Tanj Bennett is a software architect in Microsoft Applications and Services Group, primarily working on search engine algorithms and infrastructure. He has been flipping bits with microscopic tools since 1973 and joined Microsoft in 1994. He has worked on many different areas of system and application software since he is easily distracted. The first computer he worked on controlled warehouses full of refrigeration. Now he works on computers which are refrigerated warehouses. The amount of accessible memory between then and now has increased by about 1013. This is roughly the difference between bacteria and humans. Interesting times.

SLIDES: pdf


SPEAKER: Hideaki Kimura, HP Labs
The Machine: What HP and HP Labs Are Up To
The Machine is HP's next-generation server with intriguing and disruptively different designs. This talk provides a high-level summary of Hewlett-Packard's recent efforts to change the history of computers and briefly introduces a few research projects in our group, including the speaker's own project to develop a new DBMS for 1,000 cores and NVRAM.

BIO: Hideaki Kimura is a systems researcher at HP Labs. He is currently working on FOEDUS, a new database kernel to exploit next-generation servers with thousands of CPU cores and NVRAM. He earned PhD at Brown University under Stan Zdonik's supervision on database research along with Andy Pavlo's training to refute the common sense.

SLIDES: not available for this presentation

^TOP


SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/