MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank

Author(s)
Lang, Harry; Baykal, Cenk; Samra, Najib Abu; Tannous, Tony; Feldman, Dan; Rus, Daniela; ... Show more Show less
Thumbnail
DownloadAccepted version (313.7Kb)
Open Access Policy

Open Access Policy

Creative Commons Attribution-Noncommercial-Share Alike

Terms of use
Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/
Metadata
Show full item record
Abstract
© Springer Nature Switzerland AG 2019. The PageRank algorithm is used by search engines to rank websites in their search results. The algorithm outputs a probability distribution that a person randomly clicking on links will arrive at any particular page. Intuitively, a node in the center of the network should be visited with high probability even if it has few edges, and an isolated node that has many (local) neighbours will be visited with low probability. The idea of PageRank is to rank nodes according to a stable state and not according to the previous local measurement of inner/outer edges from a node that may be manipulated more easily than the corresponding entry in the stable state. In this paper we present a deterministic and completely parallelizable algorithm for computing an ε -approximation to the PageRank of a graph of n nodes. Typical inputs consist of millions of pages, but the average number of links per page is less than ten. Our algorithm takes advantage of this sparsity, assuming the out-degree of each node at most s, and terminates in O(ns/ε 2 ) time. Beyond the input graph, which may be stored in read-only storage, our algorithm uses only O(n) memory. This is the first algorithm whose complexity takes advantage of sparsity. Real data exhibits an average out-degree of 7 while n is in the millions, so the advantage is immense. Moreover, our algorithm is simple and robust to floating point precision issues. Our sparse solution (core-set) is based on reducing the PageRank problem to an l 2 approximation of the Carathéodory problem, which independently has many applications such as in machine learning and game theory. We hope that our approach will be useful for many other applications for learning sparse data and graphs. Algorithm, analysis, and open code with experimental results are provided.
Date issued
2019
URI
https://hdl.handle.net/1721.1/137195
Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Publisher
Springer International Publishing
Citation
Lang, Harry, Baykal, Cenk, Samra, Najib Abu, Tannous, Tony, Feldman, Dan et al. 2019. "Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank."
Version: Author's final manuscript
ISSN
0302-9743
1611-3349

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.