A distributed Hash table
Author(s)
Dabek, Frank (Frank Edward), 1977-
DownloadFull printable version (15.22Mb)
Alternative title
DHash table
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Robert T. Morris.
Terms of use
Metadata
Show full item recordAbstract
DHash is a new system that harnesses the storage and network resources of computers distributed across the Internet by providing a wide-area storage service, DHash. DHash frees applications from re-implementing mechanisms common to any system that stores data on a collection of machines: it maintains a mapping of objects to servers, replicates data for durability, and balances load across participating servers. Applications access data stored in DHash through a familiar hash-table interface: put stores data in the system under a key; get retrieves the data. DHash has proven useful to a number of application builders and has been used to build a content-distribution system [31], a Usenet replacement [115], and new Internet naming architectures [130, 129]. These applications demand low-latency, high-throughput access to durable data. Meeting this demand is challenging in the wide-area environment. The geographic distribution of nodes means that latencies between nodes are likely to be high: to provide a low-latency get operation the system must locate a nearby copy of the data without traversing high-latency links. (cont.) Also, wide-area network links are likely to be less reliable and have lower capacities than local-area network links: to provide durability efficiently the system must minimize the number of copies of data items it sends over these limited capacity links in response to node failure. This thesis describes the design and implementation of the DHash distributed hash table and presents algorithms and techniques that address these challenges. DHash provides low-latency operations by using a synthetic network coordinate system (Vivaldi) to find nearby copies of data without sending messages over high-latency links. A network transport (STP), designed for applications that contact a large number of nodes, lets DHash provide high throughput by striping a download across many servers without causing high packet loss or exhausting local resources. Sostenuto, a data maintenance algorithm, lets DHash maintain data durability while minimizing the number of copies of data that the system sends over limited-capacity links.
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006. Includes bibliographical references (p. 123-132) and index.
Date issued
2006Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.