Show simple item record

dc.contributor.authorKiriansky, Vladimir
dc.contributor.authorXu, Haoran
dc.contributor.authorRinard, Martin
dc.contributor.authorAmarasinghe, Saman
dc.date.accessioned2020-05-06T20:05:53Z
dc.date.available2020-05-06T20:05:53Z
dc.date.issued2018-11
dc.identifier.isbn9781450359863
dc.identifier.urihttps://hdl.handle.net/1721.1/125080
dc.description.abstractModern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for inflight memory requests. These resources, however, often exhibit poor utilization rates on workloads with large working sets, e.g., in-memory databases, key-value stores, and graph analytics, as compilers and hardware struggle to expose ILP and MLP from the instruction stream automatically. In this paper, we introduce the IMLP (Instruction and Memory Level Parallelism) task programming model. IMLP tasks execute as coroutines that yield execution at annotated long-latency operations, e.g., memory accesses, divisions, or unpredictable branches. IMLP tasks are interleaved on a single thread, and integrate well with thread parallelism and vectorization. Our DSL embedded in C++, Cimple, allows exploration of task scheduling and transformations, such as buffering, vectorization, pipelining, and prefetching. We demonstrate state-of-the-art performance on core algorithms used in in-memory databases that operate on arrays, hash tables, trees, and skip lists. Cimple applications reach 2.5× throughput gains over hardware multithreading on a multi-core, and 6.4× single thread speedup.en_US
dc.description.sponsorshipDOE (Grant DE-SC0014204)en_US
dc.description.sponsorshipToyota Research Institute (Grant LP-C000765-SR)en_US
dc.language.isoen
dc.publisherAssociation of Computing Machineryen_US
dc.relation.isversionofhttp://dx.doi.org/10.1145/3243176.3243185en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcearXiven_US
dc.titleCimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLPen_US
dc.typeArticleen_US
dc.identifier.citationKiriansky, Vladimir et al. "Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP." Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (November 2018): 30 © 2018 Association for Computing Machineryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.relation.journalProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniquesen_US
dc.eprint.versionOriginal manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2019-07-02T16:34:47Z
dspace.date.submission2019-07-02T16:34:48Z
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record