Distributed query execution on a replicated and partitioned database

Narula, Neha

dc.contributor.advisor	Robert T. Morris.	en_US
dc.contributor.author	Narula, Neha	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2011-04-25T15:58:28Z
dc.date.available	2011-04-25T15:58:28Z
dc.date.copyright	2010	en_US
dc.date.issued	2010	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/62436
dc.description	Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (p. 63-64).	en_US
dc.description.abstract	Web application developers partition and replicate their data amongst a set of SQL databases to achieve higher throughput. Given multiple copies of tables partioned different ways, developers must manually select different replicas in their application code. This work presents Dixie, a query planner and executor which automatically executes queries over replicas of partitioned data stored in a set of relational databases, and optimizes for high throughput. The challenge in choosing a good query plan lies in predicting query cost, which Dixie does by balancing row retrieval costs with the overhead of contacting many servers to execute a query. For web workloads, per-query overhead in the servers is a large part of the overall cost of execution. Dixie's cost calculation tends to minimize the number of servers used to satisfy a query, which is essential for minimizing this query overhead and obtaining high throughput; this is in direct contrast to optimizers over large data sets that try to maximize parallelism by parallelizing the execution of a query over all the servers. Dixie automatically takes advantage of the addition or removal of replicas without requiring changes in the application code. We show that Dixie sometimes chooses plans that existing parallel database query optimizers might not consider. For certain queries, Dixie chooses a plan that gives a 2.3x improvement in overall system throughput over a plan which does not take into account perserver query overhead costs. Using table replicas, Dixie provides a throughput improvement of 35% over a naive execution without replicas on an artificial workload generated by Pinax, an open source social web site.	en_US
dc.description.statementofresponsibility	by Neha Narula.	en_US
dc.format.extent	64 p.	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Distributed query execution on a replicated and partitioned database	en_US
dc.type	Thesis	en_US
dc.description.degree	S.M.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	711000749	en_US

Files in this item

Name:: 711000749-MIT.pdf
Size:: 3.135Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record