Model selection in compositional spaces

Grosse, Roger Baker

Author(s)

Grosse, Roger Baker

DownloadFull printable version (4.211Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

William T. Freeman.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

We often build complex probabilistic models by composing simpler models-using one model to generate parameters or latent variables for another model. This allows us to express complex distributions over the observed data and to share statistical structure between dierent parts of a model. In this thesis, we present a space of matrix decomposition models defined by the composition of a small number of motifs of probabilistic modeling, including clustering, low rank factorizations, and binary latent factor models. This compositional structure can be represented by a context-free grammar whose production rules correspond to these motifs. By exploiting the structure of this grammar, we can generically and eciently infer latent components and estimate predictive likelihood for nearly 2500 model structures using a small toolbox of reusable algorithms. Using a greedy search over this grammar, we automatically choose the decomposition structure from raw data by evaluating only a small fraction of all models. The proposed method typically finds the correct structure for synthetic data and backs o gracefully to simpler models under heavy noise. It learns sensible structures for datasets as diverse as image patches, motion capture, 20 Questions, and U.S. Senate votes, all using exactly the same code. We then consider several improvements to compositional structure search. We present compositional importance sampling (CIS), a novel procedure for marginal likelihood estimation which requires only posterior inference and marginal likelihood estimation algorithms corresponding to the production rules of the grammar. We analyze the performance of CIS in the case of identifying additional structure within a low-rank decomposition. This analysis yields insights into how one should design a space of models to be recursively searchable. We next consider the problem of marginal likelihood estimation for the production rules. We present a novel method for obtaining ground truth marginal likelihood values on synthetic data, which enables the rigorous quantitative comparison of marginal likelihood estimators. Using this method, we compare a wide variety of marginal likelihood estimators for the production rules of our grammar. Finally, we present a framework for analyzing the sequences of distributions used in annealed importance sampling, a state-of-the-art marginal likelihood estimator, and present a novel sequence of intermediate distributions based on averaging moments of the initial and target distributions.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Cataloged from student-submitted PDF version of thesis.

Includes bibliographical references (pages 172-181).

Date issued

2014

URI

http://hdl.handle.net/1721.1/87789

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses