Systems and Techniques for Efficient Real-World Graph Analytics
Author(s)
M. F. da Trindade, Joana
DownloadThesis PDF (1.533Mb)
Advisor
Madden, Samuel R.
Terms of use
Metadata
Show full item recordAbstract
Graphs are a natural way to model real-world entities and relationships between them, ranging from social networks and biological datasets to cloud computing infrastructure data lineage graphs. Queries over these large graphs often involve expensive subgraph traversals and complex analytical computations. Furthermore, real-world graphs often encode relationships that evolve through time, which are modeled as a temporal graph, i.e., one in which edges are associated to time attributes (such as start time and duration). These real-world graphs are often substantially more structured than a generic vertex-and-edge model would suggest, but this insight has remained mostly unexplored by existing graph engines for graph query optimization purposes. In addition, most temporal graph processing systems remain inefficient as they rely on distributed processing even for graphs that fit well within a commodity server’s available storage. To this end, we present two distinct systems tailored for optimizing large-scale real-world graph processing. The first system, Kaskade, targets the challenges of efficient query evaluation by leveraging structural properties of graphs and queries to infer materialized graph views, speeding up query evaluation by rewriting queries in terms of views it deems beneficial based on input graph and
query characteristics. The second system, Kairos, introduces selective indexing, a technique that chooses a subset of target vertices to index based on characteristics of the underlying temporal graphs and input queries. This system further employs a highly-specialized parallel data structure aimed at in-memory storage and fast retrieval of temporal edges. Finally, Kairos is built upon Ligra, the de facto benchmark system in shared-memory parallel graph processing, offering similar advantages and a familiar API to application programmers. Both systems offer speedups of up to 50-60x when compared with alternative baselines, and introduce novel classes of query optimization techniques aimed at efficient real-world graph analytics.
Date issued
2024-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology