Statistical limits of graphical channel models and a semidefinite programming approach

Kim, Chiheon

dc.contributor.advisor	Michel X. Goemans.	en_US
dc.contributor.author	Kim, Chiheon	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Mathematics.	en_US
dc.date.accessioned	2019-03-01T19:55:55Z
dc.date.available	2019-03-01T19:55:55Z
dc.date.copyright	2018	en_US
dc.date.issued	2018	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/120659
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mathematics, 2018.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 205-213).	en_US
dc.description.abstract	Community recovery is a major challenge in data science and computer science. The goal in community recovery is to find the hidden clusters from given relational data, which is often represented as a labeled hyper graph where nodes correspond to items needing to be labeled and edges correspond to observed relations between the items. We investigate the problem of exact recovery in the class of statistical models which can be expressed in terms of graphical channels. In a graphical channel model, we observe noisy measurements of the relations between k nodes while the true labeling is unknown to us, and the goal is to recover the labels correctly. This generalizes both the stochastic block models and spiked tensor models for principal component analysis, which has gained much interest over the last decade. We focus on two aspects of exact recovery: statistical limits and efficient algorithms achieving the statistic limit. For the statistical limits, we show that the achievability of exact recovery is essentially determined by whether we can recover the label of one node given other nodes labels with fairly high probability. This phenomenon was observed by Abbe et al. for generic stochastic block models, and called "local-to-global amplification". We confirm that local-to-global amplification indeed holds for generic graphical channel models, under some regularity assumptions. As a corollary, the threshold for exact recovery is explicitly determined. For algorithmic concerns, we consider two examples of graphical channel models, (i) the spiked tensor model with additive Gaussian noise, and (ii) the generalization of the stochastic block model for k-uniform hypergraphs. We propose a strategy which we call "truncate-and-relax", based on a standard semidefinite relaxation technique. We show that in these two models, the algorithm based on this strategy achieves exact recovery up to a threshold which orderwise matches the statistical threshold. We complement this by showing the limitation of the algorithm.	en_US
dc.description.statementofresponsibility	by Chiheon Kim.	en_US
dc.format.extent	213 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Mathematics.	en_US
dc.title	Statistical limits of graphical channel models and a semidefinite programming approach	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Mathematics
dc.identifier.oclc	1088419852	en_US

Files in this item

Name:: 1088419852-MIT.pdf
Size:: 9.773Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record