Leaders, followers, and community detection

Parthasarathy, Dhruuv

dc.contributor.advisor	Devavrat Shah.	en_US
dc.contributor.author	Parthasarathy, Dhruuv	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2014-11-24T18:40:28Z
dc.date.available	2014-11-24T18:40:28Z
dc.date.copyright	2014	en_US
dc.date.issued	2014	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/91858
dc.description	Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 53-55).	en_US
dc.description.abstract	Communities in social interaction networks or graphs are sets of well-connected, and very often overlapping vertices. Formally, we view any maximal clique of the social network graph as a community. The problem of finding maximal cliques is known to be computationally hard. The goal of this work is to identify structural conditions in social network graphs that lead to efficient identification of maximal cliques, i.e. overlapping communities. We propose an evolutionary model called sequential community graphs for community formation in social networks. In a sequential community graph, each node enters the graph by either joining an existing community, or creating its own. To discover communities, i.e. maximal cliques, in such graphs, we present the non-parametric Iterative Leader-Follower Algorithm (ILFA). We establish that the ILFA finds all the communities/maximal cliques correctly in the sequential community graph model in polynomial time in the number of vertices in the graph. To scale to very large data sets, we propose a minor simplification of the ILFA, called the fast leader-follower algorithm (FLFA) which effectively runs in linear time in the input data size, and finds all communities correctly for sequential community graphs with an additional constraint. Empirically, the FLFA and IFLA perform nearly the same in terms of accuracy, but the FLFA runs nearly three orders of magnitude faster. We find that the sequential community graph model is a good fit for a wide variety of social networks where users can be modeled as entering the graph by joining existing communities or creating their own. In such social networks, we demonstrate that the FLFA and ILFA outperform other state of the art algorithms both in terms of speed and accuracy. For example, in the Internet Movie Database (IMDB) graph where communities naturally correspond to actors in the same movie, our algorithms finds nearly all ground truth communities correctly while all other known community detection algorithms do very poorly. Similar empirical results are found for various other social data sets. This supports our hypothesis that we can model many social graphs as sequential community graphs and accurately detect their communities using the ILFA or FLFA.	en_US
dc.description.statementofresponsibility	by Dhruuv Parthasarathy.	en_US
dc.format.extent	55 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Leaders, followers, and community detection	en_US
dc.title.alternative	Fast, non-parametric detection of overlapping communities : the Leader-Follower algorithm	en_US
dc.title.alternative	Leader-Follower algorithm	en_US
dc.type	Thesis	en_US
dc.description.degree	M. Eng.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	894353059	en_US

Files in this item

Name:: 894353059-MIT.pdf
Size:: 2.215Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record