dc.contributor.advisor | Samuel R. Madden. | en_US |
dc.contributor.author | Long, Qian, M. Eng. Massachusetts Institute of Technology | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2016-01-04T20:00:43Z | |
dc.date.available | 2016-01-04T20:00:43Z | |
dc.date.copyright | 2015 | en_US |
dc.date.issued | 2015 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/100634 | |
dc.description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. | en_US |
dc.description | This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. | en_US |
dc.description | Cataloged from student-submitted PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (pages 63-64). | en_US |
dc.description.abstract | Scientists across many research domains collect large amounts of multi-dimensional data in their day to day work. They require high performance, scalable systems to manage and process their data. Oftentimes, the underlying distribution of these types of data is skewed and sparse, rather than dense and uniform. As input data sizes continue to grow at a rapid rate, main memory and storage capacity become bottlenecks on single machines. Thus, we look to distributed array databases as a long term solution for managing and querying this type of data. This thesis presents Multinode-TileDB, a distributed framework that extends TileDB, a new array database management system designed, from the ground up, to handle skewed and sparse arrays. We design the overall distributed architecture and propose and implement parallel algorithms for load, join, subarray, and filter while focusing on load balance and performance. Our experiments show speedup gains as cluster size increases and how different data partitioning schemes benefit the different parallel queries. | en_US |
dc.description.statementofresponsibility | by Qian Long. | en_US |
dc.format.extent | 64 pages | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | Parallel load and query processing in a distributed array database | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M. Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.oclc | 933231045 | en_US |