dc.contributor.advisor | Samuel Madden. | en_US |
dc.contributor.author | Rawlani, Praynaa | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2016-01-04T20:51:56Z | |
dc.date.available | 2016-01-04T20:51:56Z | |
dc.date.copyright | 2014 | en_US |
dc.date.issued | 2014 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/100670 | |
dc.description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. | en_US |
dc.description | Cataloged from PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (pages 99-100). | en_US |
dc.description.abstract | Graph analytics has become increasing popular in the recent years. Conventionally, data is stored in relational databases that have been refined over decades, resulting in highly optimized data processing engines. However, the awkwardness of expressing iterative queries in SQL makes the relational query-processing model inadequate for graph analytics, leading to many alternative solutions. Our research explores the possibility of combining a more natural query model with relational databases for graph analytics. In particular, we bring together a graph-natural vertex-centric query interface to highly optimized column-oriented relational databases, thus providing the efficiency of relational engines and ease-of-use of new graph systems. Throughout the thesis, we used stochastic gradient descent, a loss-minimization algorithm applied in many machine learning and graph analytics queries, as the example iterative algorithm. We implemented two different approaches for emulating a vertex-centric interface on a leading column-oriented database, Vertica: disk-based and main-memory based. The disk-based solution stores data for each iteration in relational tables and allows for interleaving SQL queries with graph algorithms. The main-memory approach stores data in memory, allowing faster updates. We applied optimizations to both implementations, which included refining logical and physical query plans, applying algorithm-level improvements and performing system-specific optimizations. The experiments and results show that the two implementations provide reasonable performance in comparison with popular graph processing systems. We present a detailed cost analysis of the two implementations and study the effect of each individual optimization on the query performance. | en_US |
dc.description.statementofresponsibility | by Praynaa Rawlani. | en_US |
dc.format.extent | 100 pages | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | Graph analytics on relational databases | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M. Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.oclc | 932127708 | en_US |