BOOM: Broadcast Optimizations for On-chip Meshes

Krishna, Tushar; Beckmann, Bradford M.; Peh, Li-Shiuan; Reinhardt, Steven K.

Author(s)

Krishna, Tushar; Beckmann, Bradford M.; Peh, Li-Shiuan; Reinhardt, Steven K.

DownloadMIT-CSAIL-TR-2011-013.pdf (834.2Kb)

Other Contributors

Computer Architecture

Advisor

Li-Shiuan Peh

Terms of use

Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported http://creativecommons.org/licenses/by-nc-nd/3.0/

Metadata

Show full item record

Abstract

Future many-core chips will require an on-chip network that can support broadcasts and multicasts at good power-performance. A vanilla on-chip network would send multiple unicast packets for each broadcast packet, resulting in latency, throughput and power overheads. Recent research in on-chip multicast support has proposed forking of broadcast/multicast packets within the network at the router buffers, but these techniques are far from ideal, since they increase buffer occupancy which lowers throughput, and packets incur delay and power penalties at each router. In this work, we analyze an ideal broadcast mesh; show the substantial gaps between state-of-the-art multicast NoCs and the ideal; then propose BOOM, which comprises a WHIRL routing protocol that ideally load balances broadcast traffic, a mXbar multicast crossbar circuit that enables multicast traversal at similar energy-delay as unicasts, and speculative bypassing of buffering for multicast flits. Together, they enable broadcast packets to approach the delay, energy, and throughput of the ideal fabric. Our simulations show BOOM realizing an average network latency that is 5% off ideal, attaining 96% of ideal throughput, with energy consumption that is 9% above ideal. Evaluations using synthetic traffic show BOOM achieving a latency reduction of 61%, throughput improvement of 63%, and buffer power reduction of 80% as compared to a baseline broadcast. Simulations with PARSEC benchmarks show BOOM reducing average request and network latency by 40% and 15% respectively.

Date issued

2011-03-14

URI

http://hdl.handle.net/1721.1/61695

Series/Report no.

MIT-CSAIL-TR-2011-013

Keywords

multicore

Collections

CSAIL Technical Reports (July 1, 2003 - present)

The following license files are associated with this item:

Creative Commons