dc.contributor.author | Li, Tianyu | |
dc.contributor.author | Chandramouli, Badrish | |
dc.contributor.author | Burckhardt, Sebastian | |
dc.contributor.author | Madden, Samuel | |
dc.date.accessioned | 2023-07-11T17:36:59Z | |
dc.date.available | 2023-07-11T17:36:59Z | |
dc.date.issued | 2023-06-20 | |
dc.identifier.issn | 2836-6573 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/151085 | |
dc.description.abstract | Providing strong fault-tolerant guarantees for the modern cloud is difficult, as application developers must
coordinate between independent stateful services and ephemeral compute, and handle various failure-induced
anomalies. We propose Composable Resilient Steps (CReSt), a new abstraction for resilient cloud applications.
CReSt uses fault-tolerant steps as its core building block, which allows participants receive, process, and send
messages as a single uninterruptible atomic unit. Composability and reliability are orthogonally achieved by
reusable CReSt implementations, for example, leveraging reliable message queues. Thus, CReSt application
builders focus solely on translating application logic into steps, and infrastructure builders focus on efficient
CReSt implementations. We propose one such implementation, called DARQ (for Deduplicated Asynchronously
Recoverable Queues). At its core, DARQ is a storage service that encapsulates CReSt participant state and
enforces CReSt semantics; developers attach ephemeral compute nodes to DARQ instances to implement
stateful distributed components. Services built with DARQ are resilient by construction, and CReSt-compatible
services naturally compose without loss of resilience. For performance, we propose a novel speculative
execution scheme to execute CReSt steps without waiting for message persistence in DARQ, effectively eliding
cloud persistence overheads; our scheme maintains CReSt’s fault-tolerance guarantees and automatically
restores consistent system state upon failure. We showcase the generality of CReSt and DARQ using two
applications: cloud streaming and workflow processing. Experiments show that DARQ is able to achieve
extremely low latency and high throughput across these use cases, often beating state-of-the-art customized
solutions. | en_US |
dc.publisher | ACM | en_US |
dc.relation.isversionof | https://doi.org/10.1145/3589262 | en_US |
dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
dc.source | Association for Computing Machinery | en_US |
dc.title | DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Li, Tianyu, Chandramouli, Badrish, Burckhardt, Sebastian and Madden, Samuel. 2023. "DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps." Proceedings of the ACM on Management of Data, 1 (2). | |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | |
dc.relation.journal | Proceedings of the ACM on Management of Data | en_US |
dc.identifier.mitlicense | PUBLISHER_POLICY | |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dc.date.updated | 2023-07-01T08:00:03Z | |
dc.language.rfc3066 | en | |
dc.rights.holder | The author(s) | |
dspace.date.submission | 2023-07-01T08:00:04Z | |
mit.journal.volume | 1 | en_US |
mit.journal.issue | 2 | en_US |
mit.license | PUBLISHER_POLICY | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |