Improving the Programmability of A Distributed Hardware Accelerator
Author(s)
Shwatal, Nathan A.
DownloadThesis PDF (602.5Kb)
Advisor
Sanchez, Daniel
Terms of use
Metadata
Show full item recordAbstract
Sparse iterative matrix algorithms are critical to many scientific and engineering workloads, yet they perform poorly on conventional hardware. (Ōmeteōtl, a new hardware accelerator with a distributed-memory and task-based execution model, aims to address these performance bottlenecks. However, programming for (Ōmeteōtl is low-level, error-prone, and far removed from the simplicity of typical iterative formulations. This thesis presents Lapis, a domain-specific language and compiler that allows users to express sparse matrix algorithms in high-level Python code and automatically generates efficient C++ code for (Ōmeteōtl. Lapis abstracts away data partitioning and task orchestration, reducing implementation complexity: for example, it lowers lines of code by 30× for conjugate gradients and 46× for power iteration. Despite this abstraction, generated code achieves 75.7% to 92.6% of the performance of manually written implementations across several benchmarks.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology