José A. Joao*, M. Aater Suleman^, Onur Mutlu, Yale N. Patt†
*ECE Department The University of Texas at Austin
^Calxeda Inc.
Computer Architecture Lab. Carnegie Mellon University
†ECE Department The University of Texas at Austin
Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages. These bottlenecks serialize execution, waste valuable execution cycles, and limit scalability of applications. This paper proposes Bottleneck Identification and Scheduling (BIS), a cooperative software-hardware mechanism to identify and accelerate the most critical bottlenecks. BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottlenecks using one or more fast cores on an Asymmetric Chip Multi- Processor (ACMP). Unlike previous work that targets specific bottlenecks, BIS can identify and accelerate bottlenecks regardless of their type. We compare BIS to four previous approaches and show that it outperforms the best of them by 15% on average. BIS' performance improvement increases as the number of cores and the number of fast cores in the system increase.
KEYWORDS: Critical sections, barriers, pipeline parallelism, multicore, asymmetric CMPs, heterogeneous CMPs
FULL PAPER: pdf