Cache-conscious scheduling of streaming applications

Agrawal, Kunal; Fineman, Jeremy T.; Krage, Jordan; Leiserson, Charles E.; Toledo, Sivan

dc.contributor.author	Agrawal, Kunal
dc.contributor.author	Fineman, Jeremy T.
dc.contributor.author	Krage, Jordan
dc.contributor.author	Leiserson, Charles E.
dc.contributor.author	Toledo, Sivan
dc.date.accessioned	2014-09-22T17:09:20Z
dc.date.available	2014-09-22T17:09:20Z
dc.date.issued	2012-06
dc.identifier.isbn	9781450312134
dc.identifier.uri	http://hdl.handle.net/1721.1/90261
dc.description.abstract	This paper considers the problem of scheduling streaming applications on uniprocessors in order to minimize the number of cache-misses. Streaming applications are represented as a directed graph (or multigraph), where nodes are computation modules and edges are channels. When a module fires, it consumes some data-items from its input channels and produces some items on its output channels. In addition, each module may have some state (either code or data) which represents the memory locations that must be loaded into cache in order to execute the module. We consider synchronous dataflow graphs where the input and output rates of modules are known in advance and do not change during execution. We also assume that the state size of modules is known in advance. Our main contribution is to show that for a large and important class of streaming computations, cache-efficient scheduling is essentially equivalent to solving a constrained graph partitioning problem. A streaming computation from this class has a cache-efficient schedule if and only if its graph has a low-bandwidth partition of the modules into components (subgraphs) whose total state fits within the cache, where the bandwidth of the partition is the number of data items that cross intercomponent channels per data item that enters the graph. Given a good partition, we describe a runtime strategy for scheduling two classes of streaming graphs: pipelines, where the graph consists of a single directed chain, and a fairly general class of directed acyclic graphs (dags) with some additional restrictions. The runtime scheduling strategy consists of adding large external buffers at the input and output edges of each component, allowing each component to be executed many times. Partitioning enables a reduction in cache misses in two ways. First, any items that are generated on edges internal to subgraphs are never written out to memory, but remain in cache. Second, each subgraph is executed many times, allowing the state to be reused. We prove the optimality of this runtime scheduling for all pipelines and for dags that meet certain conditions on buffer-size requirements. Specifically, we show that with constant-factor memory augmentation, partitioning on these graphs guarantees the optimal number of cache misses to within a constant factor. For the pipeline case, we also prove that such a partition can be found in polynomial time. For the dags we prove optimality if a good partition is provided; the partitioning problem itself is NP-complete.	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (Grant CCF-1150036)	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (Grant CNS-1017058)	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (Grant CCF-0937860)	en_US
dc.description.sponsorship	United States-Israel Binational Science Foundation (Grant 2010231)	en_US
dc.language.iso	en_US
dc.publisher	Association for Computing Machinery (ACM)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1145/2312005.2312049	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Other univ. web domain	en_US
dc.title	Cache-conscious scheduling of streaming applications	en_US
dc.type	Article	en_US
dc.identifier.citation	Kunal Agrawal, Jeremy T. Fineman, Jordan Krage, Charles E. Leiserson, and Sivan Toledo. 2012. Cache-conscious scheduling of streaming applications. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures (SPAA '12). ACM, New York, NY, USA, 236-245.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.mitauthor	Leiserson, Charles E.	en_US
dc.relation.journal	Proceedings of the 24th ACM symposium on Parallelism in algorithms and architectures (SPAA '12)	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Agrawal, Kunal; Fineman, Jeremy T.; Krage, Jordan; Leiserson, Charles E.; Toledo, Sivan	en_US
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: Leiserson_Cache-conscious.pdf
Size:: 182.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record