Building generative models over discrete structures : from graphical models to deep learning
Author(s)Gane, Georgiana Andreea.
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Tommi S. Jaakkola.
MetadataShow full item record
The goal of this thesis is to investigate generative models over discrete structures, such as binary grids, alignments or arbitrary graphs. We focused on developing models easy to sample from, and we approached the task from two broad perspectives: defining models via structured potential functions, and via neural network based decoders. In the first case, we investigated Perturbation Models, a family of implicit distributions where samples emerge through optimization of randomized potential functions. Designed explicitly for efficient sampling, Perturbation Models are strong candidates for building generative models over structures, and the leading open questions pertain to understanding the properties of the induced models and developing practical learning algorithms.In this thesis, we present theoretical results showing that, in contrast to the more established Gibbs models, low-order potential functions, after undergoing randomization and maximization, lead to high-order dependencies in the induced distributions. Furthermore, while conditioning in Gibbs' distributions is straightforward, conditioning in Perturbation Models is typically not, but we theoretically characterize cases where the straightforward approach produces the correct results. Finally, we introduce a new Perturbation Models learning algorithm based on Inverse Combinatorial Optimization. We illustrate empirically both the induced dependencies and the inverse optimization approach, in learning tasks inspired by computer vision problems. In the second case, we sequentialize the structures, converting structure generation into a sequence of discrete decisions, to enable the use of sequential models.We explore maximum likelihood training with step-wise supervision and continuous relaxations of the intermediate decisions. With respect to intermediate discrete representations, the main directions consist of using gradient estimators or designing continuous relaxations. We discuss these solutions in the context of unsupervised scene understanding with generative models. In particular, we asked whether a continuous relaxation of the counting problem also discovers the objects in an unsupervised fashion (given the increased training stability that continuous relaxations provide) and we proposed an approach based on Adaptive Computation Time (ACT) which achieves the desired result. Finally, we investigated the task of iterative graph generation. We proposed a variational lower-bound to the maximum likelihood objective, where the approximate posterior distribution renormalizes the prior distribution over local predictions which are plausible for the target graph.For instance, the local predictions may be binary values indicating the presence or absence of an edge indexed by the given time step, for a canonical edge indexing chosen a-priori. The plausibility of each local prediction is assessed by solving a combinatorial optimization problem, and we discuss relevant approaches, including an induced sub-graph isomorphism-based algorithm for the generic graph generation case, and a polynomial algorithm for the special case of graph generation resulting from solving graph clustering tasks. In this thesis, we focused on the generic case, and we investigated the approximate posterior's relevance on synthetic graph datasets.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019Cataloged from PDF version of thesis. Page 173 blank.Includes bibliographical references (pages 159-172).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.