DATE: Thursday, October 1, 2015
TIME: 12:00 pm - 1:00 pm
PLACE: RMCIC 4th Floor Panther Hollow Room

SPEAKER: Haoyuan Li, Tachyon Nexus

TITLE: Tachyon: A Memory-centric Distributed Storage System

ABSTRACT:
Memory is the key to fast big data processing. As data sets continue to grow, storage is increasingly becoming a critical bottleneck in many workloads. To address this need, we have developed Tachyon, a memory-centric fault-tolerant distributed storage system, which enables reliable data sharing at memory-speed across cluster frameworks such as Apache Spark, MapReduce, and Apache Flink. The result of over three years of research and development, Tachyon achieves both memory-speed and fault tolerance. Tachyon is Hadoop compatible. Existing Spark, MapReduce, Flink programs can run on top of it without any code changes. Tachyon is the default off-heap option in Spark. The project is open source and is already deployed at many companies in production. In addition, Tachyon has more than 100 contributors from over 40 institutions, including Yahoo, Tachyon Nexus, Redhat, Baidu, Intel, and IBM. The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the Fedora distribution. In this talk, we give an overview of Tachyon, as well as several use cases we have seen in the real world.

BIO:
Haoyuan Li is founder and CEO of Tachyon Nexus. He is also a computer science Ph.D. candidate in AMPLab at UC Berkeley, where he co-created Tachyon, an open source memory-centric distributed storage system. He is a founding committer of Apache Spark. Before Berkeley, he worked at Conviva and Google. Haoyuan has an M.S. from Cornell University and a B.S. from Peking University.

VISITOR HOST: Greg Ganger

SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/