Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling
Author(s)
Kaler, Timothy; Hasenplaugh, William Cleaburn; Schardl, Tao Benjamin; Leiserson Jr, Charles
DownloadMain article (394.6Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
A data-graph computation—popularized by such programming systems as Galois, Pregel, GraphLab, PowerGraph, and GraphChi—is an algorithm that performs local updates on the vertices of a graph. During each round of a data-graph computation, an update function atomically modifies the data associated with a vertex as a function of the vertex’s prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next round.
This article introduces Prism, a chromatic-scheduling algorithm for executing dynamic data-graph computations. Prism uses a vertex coloring of the graph to coordinate updates performed in a round, precluding the need for mutual-exclusion locks or other nondeterministic data synchronization. A multibag data structure is used by Prism to maintain a dynamic set of active vertices as an unordered set partitioned by color. We analyze Prism using work-span analysis. Let G = (V, E) be a degree-Δ graph colored with χ colors, and suppose that Q⊆V is the set of active vertices in a round. Define size(Q)= |Q| + ∑v∈ Q deg(v), which is proportional to the space required to store the vertices of Q using a sparse-graph layout. We show that a P-processor execution of Prism performs updates in Q using O(χ (lg ( Q/χ ) + lg Δ ) + lg P span and Θ(size(Q) + P) work.
These theoretical guarantees are matched by good empirical performance. To isolate the effect of the scheduling algorithm on performance, we modified GraphLab to incorporate Prism and studied seven application benchmarks on a 12-core multicore machine. Prism executes the benchmarks 1.2 to 2.1 times faster than GraphLab’s nondeterministic lock-based scheduler while providing deterministic behavior.
This article also presents Prism-R, a variation of Prism that executes dynamic data-graph computations deterministically even when updates modify global variables with associative operations. Prism-R satisfies the same theoretical bounds as Prism, but its implementation is more involved, incorporating a multivector data structure to maintain a deterministically ordered set of vertices partitioned by color. Despite its additional complexity, Prism-R is only marginally slower than Prism. On the seven application benchmarks studied, Prism-R incurs a 7% geometric mean overhead relative to Prism.
Date issued
2016-06Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryJournal
ACM Transactions on Parallel Computing
Publisher
Association for Computing Machinery (ACM)
Citation
Kaler, Tim et al. “Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling.” ACM Transactions on Parallel Computing 3, 1 (July 2016): 1–31 © Association for Computing Machinery
Version: Author's final manuscript
ISSN
2329-4949