Information extraction with network centralities : finding rumor sources, measuring influence, and learning community structure
Author(s)
Zaman, Tauhid R
DownloadFull printable version (20.99Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Devavrat Shah.
Terms of use
Metadata
Show full item recordAbstract
Network centrality is a function that takes a network graph as input and assigns a score to each node. In this thesis, we investigate the potential of network centralities for addressing inference questions arising in the context of large-scale networked data. These questions are particularly challenging because they require algorithms which are extremely fast and simple so as to be scalable, while at the same time they must perform well. It is this tension between scalability and performance that this thesis aims to resolve by using appropriate network centralities. Specifically, we solve three important network inference problems using network centrality: finding rumor sources, measuring influence, and learning community structure. We develop a new network centrality called rumor centrality to find rumor sources in networks. We give a linear time algorithm for calculating rumor centrality, demonstrating its practicality for large networks. Rumor centrality is proven to be an exact maximum likelihood rumor source estimator for random regular graphs (under an appropriate probabilistic rumor spreading model). For a wide class of networks and rumor spreading models, we prove that it is an accurate estimator. To establish the universality of rumor centrality as a source estimator, we utilize techniques from the classical theory of generalized Polya's urns and branching processes. Next we use rumor centrality to measure influence in Twitter. We develop an influence score based on rumor centrality which can be calculated in linear time. To justify the use of rumor centrality as the influence score, we use it to develop a new network growth model called topological network growth. We find that this model accurately reproduces two important features observed empirically in Twitter retweet networks: a power-law degree distribution and a superstar node with very high degree. Using these results, we argue that rumor centrality is correctly quantifying the influence of users on Twitter. These scores form the basis of a dynamic influence tracking engine called Trumor which allows one to measure the influence of users in Twitter or more generally in any networked data. Finally we investigate learning the community structure of a network. Using arguments based on social interactions, we determine that the network centrality known as degree centrality can be used to detect communities. We use this to develop the leader-follower algorithm (LFA) which can learn the overlapping community structure in networks. The LFA runtime is linear in the network size. It is also non-parametric, in the sense that it can learn both the number and size of communities naturally from the network structure without requiring any input parameters. We prove that it is very robust and learns accurate community structure for a broad class of networks. We find that the LFA does a better job of learning community structure on real social and biological networks than more common algorithms such as spectral clustering.
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011. Cataloged from PDF version of thesis. Includes bibliographical references (p. 193-197).
Date issued
2011Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.