The Pochoir Stencil Compiler
Author(s)
Tang, Yuan; Chowdhury, Rezaul; Kuszmaul, Bradley C.; Luk, Chi-Keung; Leiserson, Charles E.
DownloadLeiserson_The Pochoir Stencil Compiler.pdf (283.7Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Parallel cache-efficient stencil algorithms based on "trapezoidal decompositions" are known, but most programmers find them difficult to write. The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochoir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm. Pochoir supports general d-dimensional stencils and handles both periodic and aperiodic boundary conditions in one unified algorithm. The Pochoir system provides a C++ template library that allows the user's stencil specification to be executed directly in C++ without the Pochoir compiler (albeit more slowly), which simplifies user debugging and greatly simplified the implementation of the Pochoir compiler itself. A host of stencil benchmarks run on a modern multicore machine demonstrates that Pochoir outperforms standard parallelloop implementations, typically running 2-10 times faster. The algorithm behind Pochoir improves on prior cache-efficient algorithms on multidimensional grids by making "hyperspace" cuts, which yield asymptotically more parallelism for the same cache efficiency.
Date issued
2011-06Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryJournal
SPAA '11 Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures
Publisher
Association for Computing Machinery (ACM)
Citation
Tang, Yuan et al. “The Pochoir Stencil Compiler.” SPAA '11 Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures 2011. 117.
Version: Author's final manuscript
ISBN
978-1-4503-0743-7