Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank

Lang, Harry; Baykal, Cenk; Samra, Najib Abu; Tannous, Tony; Feldman, Dan; Rus, Daniela

dc.contributor.author	Lang, Harry
dc.contributor.author	Baykal, Cenk
dc.contributor.author	Samra, Najib Abu
dc.contributor.author	Tannous, Tony
dc.contributor.author	Feldman, Dan
dc.contributor.author	Rus, Daniela
dc.date.accessioned	2021-11-03T14:32:09Z
dc.date.available	2021-11-03T14:32:09Z
dc.date.issued	2019
dc.identifier.issn	0302-9743
dc.identifier.issn	1611-3349
dc.identifier.uri	https://hdl.handle.net/1721.1/137195
dc.description.abstract	© Springer Nature Switzerland AG 2019. The PageRank algorithm is used by search engines to rank websites in their search results. The algorithm outputs a probability distribution that a person randomly clicking on links will arrive at any particular page. Intuitively, a node in the center of the network should be visited with high probability even if it has few edges, and an isolated node that has many (local) neighbours will be visited with low probability. The idea of PageRank is to rank nodes according to a stable state and not according to the previous local measurement of inner/outer edges from a node that may be manipulated more easily than the corresponding entry in the stable state. In this paper we present a deterministic and completely parallelizable algorithm for computing an ε -approximation to the PageRank of a graph of n nodes. Typical inputs consist of millions of pages, but the average number of links per page is less than ten. Our algorithm takes advantage of this sparsity, assuming the out-degree of each node at most s, and terminates in O(ns/ε 2 ) time. Beyond the input graph, which may be stored in read-only storage, our algorithm uses only O(n) memory. This is the first algorithm whose complexity takes advantage of sparsity. Real data exhibits an average out-degree of 7 while n is in the millions, so the advantage is immense. Moreover, our algorithm is simple and robust to floating point precision issues. Our sparse solution (core-set) is based on reducing the PageRank problem to an l 2 approximation of the Carathéodory problem, which independently has many applications such as in machine learning and game theory. We hope that our approach will be useful for many other applications for learning sparse data and graphs. Algorithm, analysis, and open code with experimental results are provided.	en_US
dc.language.iso	en
dc.publisher	Springer International Publishing	en_US
dc.relation.isversionof	10.1007/978-3-030-14812-6_25	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank	en_US
dc.type	Book	en_US
dc.identifier.citation	Lang, Harry, Baykal, Cenk, Samra, Najib Abu, Tannous, Tony, Feldman, Dan et al. 2019. "Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank."
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2019-07-17T15:24:08Z
dspace.date.submission	2019-07-17T15:24:09Z
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: tamc2019.pdf
Size:: 313.7Kb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record