2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO), November 2-6, 2024, Austin, TX.
Nikhil Agarwal, Mitchell Fream, Souradip Ghosh, Brian C. Schwedock*, Nathan Beckmann
Carnegie Mellon University
* Samsung
Architectures should aim to maximize parallelism within a machine’s finite memories, but prior designs tend to extremes, either maximizing parallelism or minimizing state. In particular, prior unordered dataflow architectures suffer from a parallelism explosion that creates unbounded state, requires prohibitively large associative memories, and risks deadlock. The few architectures that successfully navigate the parallelism-state tradeoff are limited to embarrassingly parallel programs.
TYR is a new, general-purpose unordered dataflow architecture that achieves high parallelism with bounded state. The key insight is that prior unordered dataflow architectures are overly conservative, unnecessarily allocating tags from a single, global tag space. TYR exploits program structure to break up tags into local tag spaces that operate independently. Local tag spaces eliminate tag competition between co-dependent parts of the program, provably guaranteeing forward progress with only two tags per local tag space. TYR thus opens the door to an efficient, scalable implementation of unordered dataflow. Simulation of parallel programs demonstrates that TYR achieves parallelism nearly identical to a naïve unordered dataflow architecture with orders-of-magnitude less state.
KEYWORDS: Dataflow, parallelism, locality
FULL PAPER: pdf