Graph analytics on relational databases

Rawlani, Praynaa

dc.contributor.advisor	Samuel Madden.	en_US
dc.contributor.author	Rawlani, Praynaa	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2016-01-04T20:51:56Z
dc.date.available	2016-01-04T20:51:56Z
dc.date.copyright	2014	en_US
dc.date.issued	2014	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/100670
dc.description	Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 99-100).	en_US
dc.description.abstract	Graph analytics has become increasing popular in the recent years. Conventionally, data is stored in relational databases that have been refined over decades, resulting in highly optimized data processing engines. However, the awkwardness of expressing iterative queries in SQL makes the relational query-processing model inadequate for graph analytics, leading to many alternative solutions. Our research explores the possibility of combining a more natural query model with relational databases for graph analytics. In particular, we bring together a graph-natural vertex-centric query interface to highly optimized column-oriented relational databases, thus providing the efficiency of relational engines and ease-of-use of new graph systems. Throughout the thesis, we used stochastic gradient descent, a loss-minimization algorithm applied in many machine learning and graph analytics queries, as the example iterative algorithm. We implemented two different approaches for emulating a vertex-centric interface on a leading column-oriented database, Vertica: disk-based and main-memory based. The disk-based solution stores data for each iteration in relational tables and allows for interleaving SQL queries with graph algorithms. The main-memory approach stores data in memory, allowing faster updates. We applied optimizations to both implementations, which included refining logical and physical query plans, applying algorithm-level improvements and performing system-specific optimizations. The experiments and results show that the two implementations provide reasonable performance in comparison with popular graph processing systems. We present a detailed cost analysis of the two implementations and study the effect of each individual optimization on the query performance.	en_US
dc.description.statementofresponsibility	by Praynaa Rawlani.	en_US
dc.format.extent	100 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Graph analytics on relational databases	en_US
dc.type	Thesis	en_US
dc.description.degree	M. Eng.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	932127708	en_US

Files in this item

Name:: 932127708-MIT.pdf
Size:: 9.969Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record