Efficient replication of large data objects
Author(s)
Fan, Rui, 1977-
DownloadFull printable version (3.207Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Nancy A. Lynch.
Terms of use
Metadata
Show full item recordAbstract
Replication is an important technique for improving the reliability and scalability of data services. The primary problem encountered in replication is the trade-off between amount of replication, performance, and consistency. A rule of thumb states that any replication algorithm must sacrifice at least one of these criteria. In this thesis, we investigate replicating large data objects, such as files, whose size is large compared to metadata used by the replication algorithm. With this assumption, we present a distributed replication algorithm which simultaneously achieves a high replication factor, nearly optimal performance, and strong data consistency. Furthermore, our algorithm makes only basic assumptions about its environment. Our algorithm works in any asynchronous, reliable message-passing network, without relying on higher level functions such as distributed locking or group communication. Our algorithm is suitable for implementation in both LAN and WAN settings. This thesis is divided into two parts. In the first part, we formally state the assumptions and guarantees of our replication algorithm in terms of its trace properties. We then formally implement our algorithm in the IOA modeling language. We also give rigorous proofs of the algorithm's correctness and its performance analysis. The main idea of our algorithm is to separately maintain copies of the data, and information about the locations of the up-to-date copies. Our algorithm then mostly performs cheap operations on the location information, and avoids expensive operations on the actual data. The second part of this thesis presents two lower bounds on the costs of data replication. The first lower bound gives the minimum number of writes that must occur during a read operation. The second lower bound states that for a certain class of efficient replication algorithms, the replicas must use storage proportional to the maximum number of concurrent writers. The motivation for these lower bounds was certain algorithmic techniques we used in our replication algorithm. The lower bounds suggest that these techniques are necessary. The lower bounds are also of independent interest.
Description
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003. Includes bibliographical references (p. 77-78).
Date issued
2003Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.