Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP

Kiriansky, Vladimir; Xu, Haoran; Rinard, Martin; Amarasinghe, Saman

dc.contributor.author	Kiriansky, Vladimir
dc.contributor.author	Xu, Haoran
dc.contributor.author	Rinard, Martin
dc.contributor.author	Amarasinghe, Saman
dc.date.accessioned	2020-05-06T20:05:53Z
dc.date.available	2020-05-06T20:05:53Z
dc.date.issued	2018-11
dc.identifier.isbn	9781450359863
dc.identifier.uri	https://hdl.handle.net/1721.1/125080
dc.description.abstract	Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for inflight memory requests. These resources, however, often exhibit poor utilization rates on workloads with large working sets, e.g., in-memory databases, key-value stores, and graph analytics, as compilers and hardware struggle to expose ILP and MLP from the instruction stream automatically. In this paper, we introduce the IMLP (Instruction and Memory Level Parallelism) task programming model. IMLP tasks execute as coroutines that yield execution at annotated long-latency operations, e.g., memory accesses, divisions, or unpredictable branches. IMLP tasks are interleaved on a single thread, and integrate well with thread parallelism and vectorization. Our DSL embedded in C++, Cimple, allows exploration of task scheduling and transformations, such as buffering, vectorization, pipelining, and prefetching. We demonstrate state-of-the-art performance on core algorithms used in in-memory databases that operate on arrays, hash tables, trees, and skip lists. Cimple applications reach 2.5× throughput gains over hardware multithreading on a multi-core, and 6.4× single thread speedup.	en_US
dc.description.sponsorship	DOE (Grant DE-SC0014204)	en_US
dc.description.sponsorship	Toyota Research Institute (Grant LP-C000765-SR)	en_US
dc.language.iso	en
dc.publisher	Association of Computing Machinery	en_US
dc.relation.isversionof	http://dx.doi.org/10.1145/3243176.3243185	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP	en_US
dc.type	Article	en_US
dc.identifier.citation	Kiriansky, Vladimir et al. "Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP." Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (November 2018): 30 © 2018 Association for Computing Machinery	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.relation.journal	Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques	en_US
dc.eprint.version	Original manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2019-07-02T16:34:47Z
dspace.date.submission	2019-07-02T16:34:48Z
mit.metadata.status	Complete

Files in this item

Name:: 1807.01624.pdf
Size:: 762.8Kb
Format:: PDF
Description:: Submitted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record