Computational and Statistical Detection of High-Dimensional Latent Space Structure in Random Networks

Bangachev, Kiril

Author(s)

Bangachev, Kiril

DownloadThesis PDF (4.348Mb)

Advisor

Brelser, Guy

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

A probabilistic latent space graph PLSG (n, Ω, D, σ) is parametrized by its number of vertices n, a probability distribution D over some latent space Omega, and a connection function [mathematical function] such that [mathematical formula] almost surely with respect to D. To sample from [mathematical notations], first for each node [mathematical formula] an independent latent (feature) vector x_i is drawn from Omega according to D. Then, for each pair of vertices i and j an edge is drawn independently with probability sigma(x_i,x_j).$ Interest in settings of high-dimensional latent spaces $\Omega$ has surged in recent years due to the rise of high-dimensional data and powerful compute. The features x₁, x₂, . . . , xₙ are oftentimes hidden due to privacy considerations or absence of measurement. This gives rise to many challenging statistical tasks. A prerequisite for nearly any more sophisticated inference and estimation task is the following simple hypothesis testing question. When can we even test for the presence of high-dimensional latent space structure? When is there a computationally efficient test and what could this computationally efficient test be? We address the following aspects of these questions in the thesis. Chapter 2: We focus on the canonical geometric setting when latent vectors are distributed uniformly over the sphere [mathematical formula] where Tₚ is such that expected graph density is p. A conjecture that has witnessed continuous interest and progress in the past 15 years is that the information-theoretically optimal test for detecting the spherical random geometric graph is the signed triangle count. We contribute to the existing literature by confirming that the signed triangle count is computationally optimal among low-degree polynomial tests. Our main technical ingredient is a strategy for bounding Fourier coefficients of random geometric graphs based on a representation of spherical random geometric graphs as Erdős-Rényi with few planted edges. This part of the thesis is based on [BB24b]. Chapter 3: The conjectured optimality of the signed triangle count and the relavance of triangle-based statistics to the axiomatic triangle inequality of metric spaces have led to the conventional wisdom that triangle-based statistics are optimal in monotone random geometric graphs. We break this intuition by showing that in the case of a sup-norm geometry over the torus, the signed 4-cycle count is strictly stronger than the signed triangle count and is, furthermore, optimal among low-degree tests. Our main technical contribution is a novel strategy for bounding Fourier coefficients of random geometric graphs mimicking the cluster-expansion formula from statistical physics. This part of the thesis is based on [BB24a]. Chapter 4: While random geometric graphs over the sphere with Euclidean geometry and the torus with sup-norm geometry are interesting mathematically, they are perhaps too simplistic to describe real-world networks. Hence, one should ask to what extent the results and techniques used for these models generalize to other probabilistic latent space graphs. We introduce a new family of probabilistic latent space graphs which we call random algebraic graphs. In random algebraic graphs, Omega is an algebraic group and sigma is compatible with the group structure. This family captures the aforementioned random geometric graphs as well as instances of the stochastic block model and random subgraphs of Cayley graphs. We have two sets of results. First, we develop a general criterion based solely on the magnitudes of Fourier coefficients of sigma for the statistical hardness of detecting a random algebraic graph when the underlying group is the Boolean hypercube. We use this result to provide a uniform approach to many previously known results in the literature, but also highlight that certain structural properties of the connection function such as non-trivial symmetries and non-monotonicity yield novel behavior. Second, we exhibit a universal behavior for the impossibility of detecting a random algebraic graph based solely on the group size but not on the group structure. The result can be equivalently phrased in terms of the local structure of typical Cayley graphs. This part of the thesis is based on [BB23].

Date issued

2024-09

URI

https://hdl.handle.net/1721.1/158510

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses