MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Storing and managing data in a distributed hash table

Author(s)
Sit, Emil, 1977-
Thumbnail
DownloadFull printable version (11.88Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
M. Frans Kaashoek and Robert T. Morris.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Distributed hash tables (DHTs) have been proposed as a generic, robust storage infrastructure for simplifying the construction of large-scale, wide-area applications. For example, UsenetDHT is a new design for Usenet News developed in this thesis that uses a DHT to cooperatively deliver Usenet articles: the DHT allows a set of N hosts to share storage of Usenet articles, reducing their combined storage requirements by a factor of O(N). Usenet generates a continuous stream of writes that exceeds 1 Tbyte/day in volume, comprising over ten million writes. Supporting this and the associated read workload requires a DHT engineered for durability and efficiency. Recovering from network and machine failures efficiently poses a challenge for DHT replication maintenance algorithms that provide durability. To avoid losing the last replica, replica maintenance must create additional replicas when failures are detected. However, creating replicas after every failure stresses network and storage resources unnecessarily. Tracking the location of every replica of every object would allow a replica maintenance algorithm to create replicas only when necessary, but when storing terabytes of data, such tracking is difficult to perform accurately and efficiently. This thesis describes a new algorithm, Passing Tone, that maintains durability efficiently, in a completely decentralized manner, despite transient and permanent failures. Passing Tone nodes make replication decisions with just basic DHT routing state, without maintaining state about the number or location of extant replicas and without responding to every transient failure with a new replica. Passing Tone is implemented in a revised version of DHash, optimized for both disk and network performance.
 
(cont.) A sample 12 node deployment of Passing Tone and UsenetDHT supports a partial Usenet feed of 2.5 Mbyte/s (processing over 80 Tbyte of data per year), while providing 30 Mbyte/s of read throughput, limited currently by disk seeks. This deployment is the first public DHT to store terabytes of data. These results indicate that DHT-based designs can successfully simplify the construction of large-scale, wide-area systems.
 
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
 
Includes bibliographical references (p. 83-90).
 
Date issued
2008
URI
http://hdl.handle.net/1721.1/44711
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.