DPR Cluster: An Automated Framework for Deploying Resilient Stateful Cloud Microservices
Author(s)
Raicevic, Nikola
DownloadThesis PDF (740.7Kb)
Advisor
Madden, Samuel
Terms of use
Metadata
Show full item recordAbstract
Recent advances in distributed recovery protocols enable application builders to achieve strong prefix recovery guarantees in distributed systems of cache-stores (pairs of fast cache backed with persistent storage to answer storage requests) with low overhead. Specifically, Distributed Prefix Recovery (DPR) is a general-purpose protocol that implements prefix recovery guarantee for an arbitrary cluster of cache-stores with the help of a centralized management node. However, deploying such a cluster is still challenging, as it involves timely detection and restart of failed nodes, incremental roll-out of new cache-store implementations and deployments, and routing requests in a dynamic cluster with failures. Cluster administrators must manually configure DPR with this information and program cache-stores with the necessary capabilities in a fault-tolerant manner. In this thesis, we introduce the DPR cluster – an automated framework for quickly and easily deploying clusters of DPR-enhanced cache-stores. DPR Cluster utilizes Kubernetes as its cluster manager and features a declarative Python management API for scripting. Cluster administrators merely specify the desired cluster, and Kubernetes automatically deploys and manages the relevant components and restarts them on failure. Clients can dynamically discover a cluster and its components and communicate with them with DPR Cluster’s dynamic, fault-tolerant networking layer based on DNS. Additionally, DPR Cluster implements a suite of functionalities for fault-tolerance in addition to cache-store consistency, such as automatic reconnects. Our evaluation shows that DPR Cluster is highly resilient and functional with a simple API, and significantly lowers the barrier of entry for DPR deployments.
Date issued
2022-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology