PARALLEL DATA LAB

PRObE: Parallel Reconfigurable Observational Environment

PRObE is a one-of-a-kind computer facility dedicated to large-scale systems research. The facility allows hands-on operation of very large compute resources. Researchers have complete remote control of the hardware and software while running experiments and can inject both hardware and software failures while monitoring the system for reactions. Any operating system can be deployed on the systems through the Emulab system that is used to coordinate the machines. We envision this unique resource will support research in many systems related fields such as storage, networking, resiliency, big data, and other data-intensive applications. As far as we know, no other system at this scale in the world provides this ability.

The Parallel Reconfigurable Observational Environment (PRObE) is a collaboration between the National Science Foundation (NSF), New Mexico Consortium (NMC), Los Alamos National Laboratory (LANL), Carnegie Mellon University (CMU), and the University of Utah (Utah). Started in Oct 2010 computer facilties at NMC were constructed, and computers built to make a world unique systems research facility available. We are proud to make this facility available to researchers both remotely and locally for visiting researchers who come to the NMC in Los Alamos, NM.

Objectives

PRObE aims to provide a world unique large-scale, low-level, and highly instrumentable systems research facility to the community. This is accomplished by re-purposing supercomputers that Los Alamos National Laboratory (LANL) decommissions and making them available to researchers. The goal of the PRObE facility is to further research in Operating Systems, Networking, Storage, Resiliency, and other relevant systems research topics. PRObE is governed by committees with people from the community, by the community.

Projects

PRObE is accepting applications for use of the large scale clusters (~1000 nodes). The project selection committee will review these applications and process requests for time on the resource. In the mean time, two smaller (128 nodes) staging machines of the same hardware configuration are available for immediate use. PIs can log on to Marmot and Denali to request new projects to be created. Success on the smaller testing clusters are required before applications for time on the large machine will be approved.

People

FACULTY

Garth Gibson
Gary Grider, LANL
Andree Jacobson, NMC


Publications

  • PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research. Garth Gibson, Gary Grider, Andree Jacobson, Wyatt Lloyd. USENIX ;login:, v 38, n 3, June 2013.
    Abstract / PDF [1.5M]


Acknowledgements

PRObE is a collaboration between the National Science Foundation (NSF), Los Alamos National Laboratory (LANL), Carnegie Mellon University (CMU), and the University of Utah.

We thank the members and companies of the PDL Consortium: Amazon, Datadog, Google, Honda, Intel Corporation, IBM, Jane Street, Meta, Microsoft Research, Oracle Corporation, Pure Storage, Salesforce, Samsung Semiconductor Inc., Two Sigma, and Western Digital for their interest, insights, feedback, and support.