DATE: September 24, 2001 (Monday)
TIME: 3:30 - 5:00 pm
PLACE: Wean Hall 8220

SPEAKER:
Howard Gobioff
Google

TITLE:
Google - A Systems Overview

ABSTRACT:
Google currently processes over 100 million queries per day for google.com and its licensees. In this talk, I'll give a technical overview describing how we regularly crawl over a billion pages and enable the world to search this corpus of knowledge.

Google's software architecture aims to harness the power of thousands of cheap Linux PCs and organize them into a scalable, reliable, high-performance computing system. At the same time, we aim to keep the architecture as simple as possible. Our solution structures the system as a collection of TCP-and UDP-based servers, and guarantees reliability via replication of servers as well as timeouts/failover on the connections between servers.

This basic structure is used in both our crawling and serving systems. For crawling, the basic structure also utilizes replicated writes, checkpoints, and attempts to be as asynchronous as possible. In addition to the high level systems overview, I will discuss some of the real time problems that must be handled to build a large scale crawling system.

On the hardware side, the main goals are performance and cost; reliability explicitly isn't a goal (since that goal is provided by software). Thus we prefer custom-built rackmount systems assembled from standard PC components which can be bought from many different suppliers and distributors, ensuring availability and competitive pricing. A compact rack design minimizes colocation space costs.

BIO:
Howard Gobioff is a Software Engineer at Google and has seen the company grow from a 40 person upstart into the current 250 person company. At Google, Howard has been involved in a variety of projects including the advertising system and the main crawling/indexing system. He received his Ph.D. and M.S. from Carnegie Mellon University. At CMU, he was advised by Garth Gibson and Doug Tygar while working in the Parallel Data Lab. He is visiting CMU on his yearly pilgrimage to CMU to encourage
other CMU folks to join him at Google.

SDI / LCS Seminar Questions?
Karen Lindenfelser, 86716, or visit www.pdl.cmu.edu/SDI/