Graph analytics on relational databases
Author(s)
Rawlani, Praynaa
DownloadFull printable version (9.969Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Samuel Madden.
Terms of use
Metadata
Show full item recordAbstract
Graph analytics has become increasing popular in the recent years. Conventionally, data is stored in relational databases that have been refined over decades, resulting in highly optimized data processing engines. However, the awkwardness of expressing iterative queries in SQL makes the relational query-processing model inadequate for graph analytics, leading to many alternative solutions. Our research explores the possibility of combining a more natural query model with relational databases for graph analytics. In particular, we bring together a graph-natural vertex-centric query interface to highly optimized column-oriented relational databases, thus providing the efficiency of relational engines and ease-of-use of new graph systems. Throughout the thesis, we used stochastic gradient descent, a loss-minimization algorithm applied in many machine learning and graph analytics queries, as the example iterative algorithm. We implemented two different approaches for emulating a vertex-centric interface on a leading column-oriented database, Vertica: disk-based and main-memory based. The disk-based solution stores data for each iteration in relational tables and allows for interleaving SQL queries with graph algorithms. The main-memory approach stores data in memory, allowing faster updates. We applied optimizations to both implementations, which included refining logical and physical query plans, applying algorithm-level improvements and performing system-specific optimizations. The experiments and results show that the two implementations provide reasonable performance in comparison with popular graph processing systems. We present a detailed cost analysis of the two implementations and study the effect of each individual optimization on the query performance.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. Cataloged from PDF version of thesis. Includes bibliographical references (pages 99-100).
Date issued
2014Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.