DATE: Thursday, April 17, 2014
TIME: Noon - 1:00 pm
PLACE: TBA
SPEAKER: Lee Sheng, WibiData
TITLE: Exploring Enron Email Dataset with Kiji and Hive
ABSTRACT:
Apache Hive is a data warehousing system for large volumes of data stored in Hadoop that provides SQL based access for exploring datasets. KijiSchema provides evolvable schemas of primitive and compound types on top of HBase. The integration between these provides the best aspects of both worlds (ad hoc SQL based querying on top of datasets using evolvable schemas containing complex objects). This talk will present an examples of queries utilizing this integration to do exploratory analysis of the Enron email corpus. Delving into topics such as email responder pairs and sentiment analysis can expose many of the interesting points in the rise and fall of Enron.
BIO:
Lee is an engineer at WibiData who works on building tools for building Big Data Applications. He holds a BS in Computer Science from Carnegie Mellon University. Previous stints include developing systems for making strategic buying decisions at Amazon.com as well as distributed simulation frameworks for the Department of Defense.
VISITOR HOST: Andy Pavlo
VISITOR COORDINATOR:
Jenn Landefeld jennsbl@cs.cmu.edu
SDI / ISTC SEMINAR QUESTIONS?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/
*partially funded by