Parallel algorithms for scheduling data-graph computations
Author(s)
Hasenplaugh, William Cleaburn
DownloadFull printable version (7.388Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Charles E. Leiserson.
Terms of use
Metadata
Show full item recordAbstract
A data-graph computation - popularized by such programming systems as Pregel, GraphLab, Galois, Ligra, PowerGraph, and GraphChi - is an algorithm that iteratively performs local updates on the vertices of a graph. During each round of a data-graph computation, a user-supplied update function atomically modifies the data associated with a vertex as a function of the vertex's prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next round. In this thesis, I explore two ways of scheduling deterministic parallel data-graph computations that provide performance guarantees culminating in theoretical contributions to graph theory and practical, high-performance systems. In particular, I describe a system called Prism which processes dynamic and static data-graph computations on arbitrary graphs using a technique called chromatic scheduling. Using a vertex-coloring to identify independent sets of vertices, which may be safely processed in parallel, Prism serializes through the colors and processes the independent sets in parallel, thus executing data-graph computations deterministically and without the use of costly atomic instructions (e.g., Compare-And-Swap). Prism supports dynamic data-graph computations deterministically and work-efficiently through the introduction of multibag and multivector data structures. Prism requires a vertex-coloring, and since graphs are generally not supplied with one, it is necessary to find one as a preprocessing step. Furthermore, the runtime of Prism is linear in the number of colors and thus motivates a study in this thesis of fast parallel coloring algorithms that provide vertex-colorings with few colors in practice. At the core of the analysis of these coloring algorithms lies a new result about the maximum depth of a random priority dag, the dag that results from randomly ordering vertices and directing edges from lower to higher numbered vertices in the order. In particular, when the largest degree [delta] in the graph G = (V,E) is less than ln !V !, I show a tight bound on the longest path: [theta](ln V / ln (e ln V / [delta])) with high probability. When [delta] is greater than ln !V!, the longest path in the dag is simply [theta] (min {[delta], [square root sign]E}) , also with high probability. I also present a system called Laika which processes data-graph computations for the special, but important, case of graphs representing physical simulations. Such graphs typically have vertices with coordinates in 3D space and are connected to other "nearby" vertices. We take advantage of these two properties to execute physical simulations, cast as data-graph computations, that make efficient use of cache resources. I analyze a contrived graph construction - a random cube graph - as a proxy for the mesh graphs that arise in physical simulations: n vertices are uniformly randomly assigned positions in the unit cube and have edges connecting them to any other vertices that are within a distance r = O (V -¹/³) . For such a graph and given a cache sufficiently large to hold M vertices, I improve on previous theory to show that a fraction O(M-¹/³) of edges will connect to vertices not in the cache, whereas previous theory held that this "miss rate" is O(M-¹/⁴). Laika also guarantees linear speedup for any random cube graph G = (V,E) with constant average degree for any number of workers P = O (V= lg² V).
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 169-182).
Date issued
2016Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.