Show simple item record

dc.contributor.authorLi, Tianyu
dc.contributor.authorChandramouli, Badrish
dc.contributor.authorBurckhardt, Sebastian
dc.contributor.authorMadden, Samuel
dc.date.accessioned2023-07-11T17:36:59Z
dc.date.available2023-07-11T17:36:59Z
dc.date.issued2023-06-20
dc.identifier.issn2836-6573
dc.identifier.urihttps://hdl.handle.net/1721.1/151085
dc.description.abstractProviding strong fault-tolerant guarantees for the modern cloud is difficult, as application developers must coordinate between independent stateful services and ephemeral compute, and handle various failure-induced anomalies. We propose Composable Resilient Steps (CReSt), a new abstraction for resilient cloud applications. CReSt uses fault-tolerant steps as its core building block, which allows participants receive, process, and send messages as a single uninterruptible atomic unit. Composability and reliability are orthogonally achieved by reusable CReSt implementations, for example, leveraging reliable message queues. Thus, CReSt application builders focus solely on translating application logic into steps, and infrastructure builders focus on efficient CReSt implementations. We propose one such implementation, called DARQ (for Deduplicated Asynchronously Recoverable Queues). At its core, DARQ is a storage service that encapsulates CReSt participant state and enforces CReSt semantics; developers attach ephemeral compute nodes to DARQ instances to implement stateful distributed components. Services built with DARQ are resilient by construction, and CReSt-compatible services naturally compose without loss of resilience. For performance, we propose a novel speculative execution scheme to execute CReSt steps without waiting for message persistence in DARQ, effectively eliding cloud persistence overheads; our scheme maintains CReSt’s fault-tolerance guarantees and automatically restores consistent system state upon failure. We showcase the generality of CReSt and DARQ using two applications: cloud streaming and workflow processing. Experiments show that DARQ is able to achieve extremely low latency and high throughput across these use cases, often beating state-of-the-art customized solutions.en_US
dc.publisherACMen_US
dc.relation.isversionofhttps://doi.org/10.1145/3589262en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleDARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Stepsen_US
dc.typeArticleen_US
dc.identifier.citationLi, Tianyu, Chandramouli, Badrish, Burckhardt, Sebastian and Madden, Samuel. 2023. "DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps." Proceedings of the ACM on Management of Data, 1 (2).
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journalProceedings of the ACM on Management of Dataen_US
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2023-07-01T08:00:03Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2023-07-01T08:00:04Z
mit.journal.volume1en_US
mit.journal.issue2en_US
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record