Show simple item record

dc.contributor.authorDenniston, Tyler
dc.contributor.authorKamil, Shoaib
dc.contributor.authorAmarasinghe, Saman P
dc.date.accessioned2017-07-18T16:00:20Z
dc.date.available2017-07-18T16:00:20Z
dc.date.issued2016-08
dc.identifier.isbn9781450340922
dc.identifier.urihttp://hdl.handle.net/1721.1/110762
dc.description.abstractMany image processing tasks are naturally expressed as a pipeline of small computational kernels known as stencils. Halide is a popular domain-specific language and compiler designed to implement image processing algorithms. Halide uses simple language constructs to express what to compute and a separate scheduling co-language for expressing when and where to perform the computation. This approach has demonstrated performance comparable to or better than hand-optimized code. Until now, however, Halide has been restricted to parallel shared memory execution, limiting its performance for memory-bandwidth-bound pipelines or large-scale image processing tasks. We present an extension to Halide to support distributed-memory parallel execution of complex stencil pipelines. These extensions compose with the existing scheduling constructs in Halide, allowing expression of complex computation and communication strategies. Existing Halide applications can be distributed with minimal changes, allowing programmers to explore the tradeoff between recomputation and communication with little effort. Approximately 10 new of lines code are needed even for a 200 line, 99 stage application. On nine image processing benchmarks, our extensions give up to a 1.4× speedup on a single node over regular multithreaded execution with the same number of cores, by mitigating the effects of non-uniform memory access. The distributed benchmarks achieve up to 18× speedup on a 16 node testing machine and up to 57× speedup on 64 nodes of the NERSC Cori supercomputer.en_US
dc.description.sponsorshipUnited States. Department of Energy (award DE-SC0005288)en_US
dc.description.sponsorshipUnited States. Department of Energy (award DE-SC0008923)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (XPS-1533753)en_US
dc.language.isoen_US
dc.publisherAssociation for Computing Machineryen_US
dc.relation.isversionofhttp://dx.doi.org/10.1145/2851141.2851157en_US
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivs Licenseen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en_US
dc.sourceACMen_US
dc.titleDistributed Halideen_US
dc.typeArticleen_US
dc.identifier.citationDenniston, Tyler, Shoaib Kamil, and Saman Amarasinghe. “Distributed Halide.” Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP ’16 (2016).en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorDenniston, Tyler
dc.contributor.mitauthorKamil, Shoaib
dc.contributor.mitauthorAmarasinghe, Saman P
dc.relation.journalProceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16en_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsDenniston, Tyler; Kamil, Shoaib; Amarasinghe, Samanen_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0003-4400-8947
dc.identifier.orcidhttps://orcid.org/0000-0002-7231-7643
mit.licensePUBLISHER_CCen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record