Show simple item record

dc.contributor.advisorRobert T. Morris.en_US
dc.contributor.authorBurkard, Timo, 1979-en_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2005-08-23T20:40:22Z
dc.date.available2005-08-23T20:40:22Z
dc.date.copyright2002en_US
dc.date.issued2002en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/8503
dc.descriptionThesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, June 2002.en_US
dc.description"May 2002."en_US
dc.descriptionIncludes bibliographical references (p. 63-64).en_US
dc.description.abstractIn this thesis, we present the design and implementation of Herodotus, a peer-to-peer web archival system. Like the Wayback Machine, a website that currently offers a web archive, Herodotus periodically crawls the world wide web and stores copies of all downloaded web content. Unlike the Wayback Machine, Herodotus does not rely on a centralized server farm. Instead, many individual nodes spread out across the Internet collaboratively perform the task of crawling and storing the content. This allows a large group of people to contribute idle computer resources to jointly achieve the goal of creating an Internet archive. Herodotus uses replication to ensure the persistence of data as nodes join and leave. Herodotus is implemented on top of Chord, a distributed peer-to-peer lookup service. It is written in C++ on FreeBSD. Our analysis based on an estimated size of the World Wide Web shows that a set of 20,000 nodes would be required to archive the entire web, assuming that each node has a typical home broadband Internet connection and contributes 100 GB of storage.en_US
dc.description.statementofresponsibilityby Timo Burkard.en_US
dc.format.extent64 p.en_US
dc.format.extent4382101 bytes
dc.format.extent4381860 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/pdf
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582
dc.subjectElectrical Engineering and Computer Science.en_US
dc.subject.lccZA4235 .B8 2002en_US
dc.subject.lcshWorld Wide Web Archival resources Data processingen_US
dc.titleHerodotus : a peer-to-peer Web archival systemen_US
dc.typeThesisen_US
dc.description.degreeM.Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc50764052en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record