dc.contributor.author | Tang, Yuan | |
dc.contributor.author | Chowdhury, Rezaul | |
dc.contributor.author | Kuszmaul, Bradley C. | |
dc.contributor.author | Luk, Chi-Keung | |
dc.contributor.author | Leiserson, Charles E. | |
dc.date.accessioned | 2012-08-14T18:18:35Z | |
dc.date.available | 2012-08-14T18:18:35Z | |
dc.date.issued | 2011-06 | |
dc.identifier.isbn | 978-1-4503-0743-7 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/72122 | |
dc.description.abstract | A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Parallel cache-efficient stencil algorithms based on "trapezoidal decompositions" are known, but most programmers find them difficult to write. The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochoir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm. Pochoir supports general d-dimensional stencils and handles both periodic and aperiodic boundary conditions in one unified algorithm. The Pochoir system provides a C++ template library that allows the user's stencil specification to be executed directly in C++ without the Pochoir compiler (albeit more slowly), which simplifies user debugging and greatly simplified the implementation of the Pochoir compiler itself. A host of stencil benchmarks run on a modern multicore machine demonstrates that Pochoir outperforms standard parallelloop implementations, typically running 2-10 times faster. The algorithm behind Pochoir improves on prior cache-efficient algorithms on multidimensional grids by making "hyperspace" cuts, which yield asymptotically more parallelism for the same cache efficiency. | en_US |
dc.language.iso | en_US | |
dc.publisher | Association for Computing Machinery (ACM) | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1145/1989493.1989508 | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike 3.0 | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/ | en_US |
dc.source | MIT web domain | en_US |
dc.title | The Pochoir Stencil Compiler | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Tang, Yuan et al. “The Pochoir Stencil Compiler.” SPAA '11 Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures 2011. 117. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
dc.contributor.approver | Leiserson, Charles E. | |
dc.contributor.mitauthor | Tang, Yuan | |
dc.contributor.mitauthor | Chowdhury, Rezaul | |
dc.contributor.mitauthor | Kuszmaul, Bradley C. | |
dc.contributor.mitauthor | Luk, Chi-Keung | |
dc.contributor.mitauthor | Leiserson, Charles E. | |
dc.relation.journal | SPAA '11 Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
dspace.orderedauthors | Tang, Yuan; Chowdhury, Rezaul Alam; Kuszmaul, Bradley C.; Luk, Chi-Keung; Leiserson, Charles E. | en |
dc.identifier.orcid | https://orcid.org/0000-0003-4059-765X | |
mit.license | OPEN_ACCESS_POLICY | en_US |
mit.metadata.status | Complete | |