Show simple item record

dc.contributor.advisorSamuel R. Madden.en_US
dc.contributor.authorMei, Yuan, Ph. D. Massachusetts Institute of Technologyen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2015-07-17T19:48:45Z
dc.date.available2015-07-17T19:48:45Z
dc.date.copyright2015en_US
dc.date.issued2015en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/97808
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 125-130).en_US
dc.description.abstractMany large financial, news, and social media companies process and stream large quantities of data to customers, either through the public Internet or on their own internal networks. These customers often depend on that data being delivered in a timely and resource-efficient manner. In addition, many customers subscribe to the same or similar data products (e.g., particular types of financial feeds, or feeds of specific social media users). A naive implementation of a data dissemination network like this will cause redundant data to be processed and delivered repeatedly, wasting CPU and bandwidth, increasing network delays, and driving up costs. In this dissertation, we present SPRAWL, a distributed stream processing layer to address the wide-area data processing and dissemination problem. SPRAWL provides two key functions. First, it is able to generate a shared and distributed multi-query plan that transmits records through the network just once, and shares the computation of streaming operators that operate on the same subset of data. Second, it is able to compute an in-network placement of complex queries (each with dozens of operators) in wide-area networks (consisting of thousands of nodes). This placement is optimal within polynomial time and memory complexity when there are no resource (CPU, bandwidth) or query (latency) constraints. In addition, we develop several heuristics to guarantee the placement is near optimal when constraints are violated, and experimentally evaluate the performance of our algorithms versus an exhausting algorithm. We also design and implement a distributed version of the SPRAWL placement algorithm in order to support wide-area networks consisting of thousands of nodes, which centralized algorithms cannot handle. Finally, we show that SPRAWL can make complex query placement decisions on wide-area networks within seconds, and the placement can increase throughput by up to a factor of 5 and reduce dollar costs by a factor of 6 on a financial data stream processing task.en_US
dc.description.statementofresponsibilityby Yuan Mei.en_US
dc.format.extent130 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleThe SPRAWL distributed stream dissemination systemen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc912308238en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record