| dc.description.abstract | Modern high-performance software from a variety of domains relies on hand-written and hand-optimized libraries to obtain the best performance. Besides general fine-grained operators that can be composed to write entire applications, these libraries also provide coarser-grained fused and hand-optimized operators that are much faster due to being optimized for a specific sequence of operations. However, as application needs keep growing, library writers are not able to keep up and have to make the tradeoff of either sacrificing performance or generality. Domain-specific languages or DSLs are able to break this tradeoff by automatically generating the best implementation for any arbitrary sequence of operations specified by the end user. However, DSL compilers suffer from a bigger challenge that they require a lot of compiler knowledge to implement parsers, IR, analysis and transformations, and code generation, which is outside the scope of a typical domain expert. To make compiler technology and the benefits of code-generation more accessible to domain experts, I propose the use of multi-stage programming to allow writers to write library-like code while also combining it to generate the most efficient implementation for any whole program. In this thesis, I discuss the design of different multi-stage programming systems, the benefits and drawbacks. Next, I propose Re-Execution Based Multi-Staging (REMS) that addresses a critical flaw in many imperative Multi-Staging systems - the side-effect leak problem. I introduce BuildIt, an implementation of REMS in one of the most popular languages for writing high-performance applications, C++ in a type-based, lightweight way without changing the compiler. I describe the internals of BuildIt and how it implements the key features of REMS. Furthermore, I describe a set of extensions implemented on top of BuildIt that facilitate the development of high-performance DSLs with ease. I show the application of BuildIt to create three DSLs - EasyGraphit, NetBlocks, and BREeze that target graph analytics, ad-hoc network protocol generation, and Regex matching. All these case studies show 10-100x reduction in the amount of effort required to implement these DSLs that perform on-par with or better than state-of-the-art compiler frameworks while targeting diverse architectures like CPUs and GPUS. Finally, I introduce D2X, a system that is designed to add extensible and contextual debugging support to DSL implementations without having to make any changes to off-the-shelf debuggers or mess with complex debugging formats. Next, I show how applying D2X to the BuildIt system greatly improves the debugging experience for all DSLs written with BuildIt. | |