DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps
Author(s)
Li, Tianyu; Chandramouli, Badrish; Burckhardt, Sebastian; Madden, Samuel
Download3589262.pdf (8.686Mb)
Publisher Policy
Publisher Policy
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Terms of use
Metadata
Show full item recordAbstract
Providing strong fault-tolerant guarantees for the modern cloud is difficult, as application developers must
coordinate between independent stateful services and ephemeral compute, and handle various failure-induced
anomalies. We propose Composable Resilient Steps (CReSt), a new abstraction for resilient cloud applications.
CReSt uses fault-tolerant steps as its core building block, which allows participants receive, process, and send
messages as a single uninterruptible atomic unit. Composability and reliability are orthogonally achieved by
reusable CReSt implementations, for example, leveraging reliable message queues. Thus, CReSt application
builders focus solely on translating application logic into steps, and infrastructure builders focus on efficient
CReSt implementations. We propose one such implementation, called DARQ (for Deduplicated Asynchronously
Recoverable Queues). At its core, DARQ is a storage service that encapsulates CReSt participant state and
enforces CReSt semantics; developers attach ephemeral compute nodes to DARQ instances to implement
stateful distributed components. Services built with DARQ are resilient by construction, and CReSt-compatible
services naturally compose without loss of resilience. For performance, we propose a novel speculative
execution scheme to execute CReSt steps without waiting for message persistence in DARQ, effectively eliding
cloud persistence overheads; our scheme maintains CReSt’s fault-tolerance guarantees and automatically
restores consistent system state upon failure. We showcase the generality of CReSt and DARQ using two
applications: cloud streaming and workflow processing. Experiments show that DARQ is able to achieve
extremely low latency and high throughput across these use cases, often beating state-of-the-art customized
solutions.
Date issued
2023-06-20Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryJournal
Proceedings of the ACM on Management of Data
Publisher
ACM
Citation
Li, Tianyu, Chandramouli, Badrish, Burckhardt, Sebastian and Madden, Samuel. 2023. "DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps." Proceedings of the ACM on Management of Data, 1 (2).
Version: Final published version
ISSN
2836-6573