MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

DPR Cluster: An Automated Framework for Deploying Resilient Stateful Cloud Microservices

Author(s)
Raicevic, Nikola
Thumbnail
DownloadThesis PDF (740.7Kb)
Advisor
Madden, Samuel
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Recent advances in distributed recovery protocols enable application builders to achieve strong prefix recovery guarantees in distributed systems of cache-stores (pairs of fast cache backed with persistent storage to answer storage requests) with low overhead. Specifically, Distributed Prefix Recovery (DPR) is a general-purpose protocol that implements prefix recovery guarantee for an arbitrary cluster of cache-stores with the help of a centralized management node. However, deploying such a cluster is still challenging, as it involves timely detection and restart of failed nodes, incremental roll-out of new cache-store implementations and deployments, and routing requests in a dynamic cluster with failures. Cluster administrators must manually configure DPR with this information and program cache-stores with the necessary capabilities in a fault-tolerant manner. In this thesis, we introduce the DPR cluster – an automated framework for quickly and easily deploying clusters of DPR-enhanced cache-stores. DPR Cluster utilizes Kubernetes as its cluster manager and features a declarative Python management API for scripting. Cluster administrators merely specify the desired cluster, and Kubernetes automatically deploys and manages the relevant components and restarts them on failure. Clients can dynamically discover a cluster and its components and communicate with them with DPR Cluster’s dynamic, fault-tolerant networking layer based on DNS. Additionally, DPR Cluster implements a suite of functionalities for fault-tolerance in addition to cache-store consistency, such as automatic reconnects. Our evaluation shows that DPR Cluster is highly resilient and functional with a simple API, and significantly lowers the barrier of entry for DPR deployments.
Date issued
2022-09
URI
https://hdl.handle.net/1721.1/147510
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.