The Pochoir Stencil Compiler
Author(s)Tang, Yuan; Chowdhury, Rezaul; Kuszmaul, Bradley C.; Luk, Chi-Keung; Leiserson, Charles E.
MetadataShow full item record
A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Parallel cache-efficient stencil algorithms based on "trapezoidal decompositions" are known, but most programmers find them difficult to write. The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochoir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm. Pochoir supports general d-dimensional stencils and handles both periodic and aperiodic boundary conditions in one unified algorithm. The Pochoir system provides a C++ template library that allows the user's stencil specification to be executed directly in C++ without the Pochoir compiler (albeit more slowly), which simplifies user debugging and greatly simplified the implementation of the Pochoir compiler itself. A host of stencil benchmarks run on a modern multicore machine demonstrates that Pochoir outperforms standard parallelloop implementations, typically running 2-10 times faster. The algorithm behind Pochoir improves on prior cache-efficient algorithms on multidimensional grids by making "hyperspace" cuts, which yield asymptotically more parallelism for the same cache efficiency.
DepartmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
SPAA '11 Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures
Association for Computing Machinery (ACM)
Tang, Yuan et al. “The Pochoir Stencil Compiler.” SPAA '11 Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures 2011. 117.
Author's final manuscript