Convex relaxation methods for graphical models : Lagrangian and maximum entropy approaches
Author(s)Johnson, Jason K. (Jason Kyle)
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Alan S. Willsky.
MetadataShow full item record
Graphical models provide compact representations of complex probability distributions of many random variables through a collection of potential functions defined on small subsets of these variables. This representation is defined with respect to a graph in which nodes represent random variables and edges represent the interactions among those random variables. Graphical models provide a powerful and flexible approach to many problems in science and engineering, but also present serious challenges owing to the intractability of optimal inference and estimation over general graphs. In this thesis, we consider convex optimization methods to address two central problems that commonly arise for graphical models. First, we consider the problem of determining the most probable configuration-also known as the maximum a posteriori (MAP) estimate-of all variables in a graphical model, conditioned on (possibly noisy) measurements of some variables. This general problem is intractable, so we consider a Lagrangian relaxation (LR) approach to obtain a tractable dual problem. This involves using the Lagrangian decomposition technique to break up an intractable graph into tractable subgraphs, such as small "blocks" of nodes, embedded trees or thin subgraphs. We develop a distributed, iterative algorithm that minimizes the Lagrangian dual function by block coordinate descent. This results in an iterative marginal-matching procedure that enforces consistency among the subgraphs using an adaptation of the well-known iterative scaling algorithm. This approach is developed both for discrete variable and Gaussian graphical models. In discrete models, we also introduce a deterministic annealing procedure, which introduces a temperature parameter to define a smoothed dual function and then gradually reduces the temperature to recover the (non-differentiable) Lagrangian dual. When strong duality holds, we recover the optimal MAP estimate. We show that this occurs for a broad class of "convex decomposable" Gaussian graphical models, which generalizes the "pairwise normalizable" condition known to be important for iterative estimation in Gaussian models.(cont.) In certain "frustrated" discrete models a duality gap can occur using simple versions of our approach. We consider methods that adaptively enhance the dual formulation, by including more complex subgraphs, so as to reduce the duality gap. In many cases we are able to eliminate the duality gap and obtain the optimal MAP estimate in a tractable manner. We also propose a heuristic method to obtain approximate solutions in cases where there is a duality gap. Second, we consider the problem of learning a graphical model (both the graph and its potential functions) from sample data. We propose the maximum entropy relaxation (MER) method, which is the convex optimization problem of selecting the least informative (maximum entropy) model over an exponential family of graphical models subject to constraints that small subsets of variables should have marginal distributions that are close to the distribution of sample data. We use relative entropy to measure the divergence between marginal probability distributions. We find that MER leads naturally to selection of sparse graphical models. To identify this sparse graph efficiently, we use a "bootstrap" method that constructs the MER solution by solving a sequence of tractable subproblems defined over thin graphs, including new edges at each step to correct for large marginal divergences that violate the MER constraint. The MER problem on each of these subgraphs is efficiently solved using the primaldual interior point method (implemented so as to take advantage of efficient inference methods for thin graphical models). We also consider a dual formulation of MER that minimizes a convex function of the potentials of the graphical model. This MER dual problem can be interpreted as a robust version of maximum-likelihood parameter estimation, where the MER constraints specify the uncertainty in the sufficient statistics of the model. This also corresponds to a regularized maximum-likelihood approach, in which an information-geometric regularization term favors selection of sparse potential representations. We develop a relaxed version of the iterative scaling method to solve this MER dual problem.
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 241-257).
DepartmentMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.