Efficient replication of large data objects

Fan, Rui, 1977-

dc.contributor.advisor	Nancy A. Lynch.	en_US
dc.contributor.author	Fan, Rui, 1977-	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2006-03-24T16:04:29Z
dc.date.available	2006-03-24T16:04:29Z
dc.date.copyright	2003	en_US
dc.date.issued	2003	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/29581
dc.description	Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.	en_US
dc.description	Includes bibliographical references (p. 77-78).	en_US
dc.description.abstract	Replication is an important technique for improving the reliability and scalability of data services. The primary problem encountered in replication is the trade-off between amount of replication, performance, and consistency. A rule of thumb states that any replication algorithm must sacrifice at least one of these criteria. In this thesis, we investigate replicating large data objects, such as files, whose size is large compared to metadata used by the replication algorithm. With this assumption, we present a distributed replication algorithm which simultaneously achieves a high replication factor, nearly optimal performance, and strong data consistency. Furthermore, our algorithm makes only basic assumptions about its environment. Our algorithm works in any asynchronous, reliable message-passing network, without relying on higher level functions such as distributed locking or group communication. Our algorithm is suitable for implementation in both LAN and WAN settings. This thesis is divided into two parts. In the first part, we formally state the assumptions and guarantees of our replication algorithm in terms of its trace properties. We then formally implement our algorithm in the IOA modeling language. We also give rigorous proofs of the algorithm's correctness and its performance analysis. The main idea of our algorithm is to separately maintain copies of the data, and information about the locations of the up-to-date copies. Our algorithm then mostly performs cheap operations on the location information, and avoids expensive operations on the actual data. The second part of this thesis presents two lower bounds on the costs of data replication. The first lower bound gives the minimum number of writes that must occur during a read operation. The second lower bound states that for a certain class of efficient replication algorithms, the replicas must use storage proportional to the maximum number of concurrent writers. The motivation for these lower bounds was certain algorithmic techniques we used in our replication algorithm. The lower bounds suggest that these techniques are necessary. The lower bounds are also of independent interest.	en_US
dc.description.statementofresponsibility	by Rui Fan.	en_US
dc.format.extent	78 p.	en_US
dc.format.extent	3363563 bytes
dc.format.extent	3363369 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	application/pdf
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Efficient replication of large data objects	en_US
dc.type	Thesis	en_US
dc.description.degree	S.M.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	52758605	en_US

Files in this item

Name:: 52758605-MIT.pdf
Size:: 3.207Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record