Show simple item record

dc.contributor.advisorJohn Fisher.en_US
dc.contributor.authorLin, Dahua, Ph. D. Massachusetts Institute of Technologyen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2013-04-12T19:25:20Z
dc.date.available2013-04-12T19:25:20Z
dc.date.copyright2012en_US
dc.date.issued2012en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/78453
dc.descriptionThesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (p. 301-312).en_US
dc.description.abstractModeling visual scenes is one of the fundamental tasks of computer vision. Whereas tremendous efforts have been devoted to video analysis in past decades, most prior work focuses on specific tasks, leading to dedicated methods to solve them. This PhD thesis instead aims to derive a probabilistic generative model that coherently integrates different aspects, notably appearance, motion, and the interaction between them. Specifically, this model considers each video as a composite of dynamic layers, each associated with a covering domain, an appearance template, and a flow describing its motion. These layers change dynamically following the associated flows, and are combined into video frames according to a Z-order that specifies their relative depth-order. To describe these layers and their dynamic changes, three major components are incorporated: (1) An appearance model describes the generative process of the pixel values of a video layer. This model, via the combination of a probabilistic patch manifold and a conditional Markov random field, is able to express rich local details while maintaining global coherence. (2) A motion model captures the motion pattern of a layer through a new concept called geometric flow that originates from differential geometric analysis. A geometric flow unifies the trajectory-based representation and the notion of geometric transformation to represent the collective dynamic behaviors persisting over time. (3) A partial Z-order specifies the relative depth order between layers. Here, through the unique correspondence between equivalent classes of partial orders and consistent choice functions, a distribution over the spaces of partial orders is established, and inference can thus be performed thereon. The development of these models leads to significant challenges in probabilistic modeling and inference that need new techniques to address. We studied two important problems: (1) Both the appearance model and the motion model rely on mixture modeling to capture complex distributions. In a dynamic setting, the components parameters and the number of components in a mixture model can change over time. While the use of Dirichlet processes (DPs) as priors allows indefinite number of components, incorporating temporal dependencies between DPs remains a nontrivial issue, theoretically and practically. Our research on this problem leads to a new construction of dependent DPs, enabling various forms of dynamic variations for nonparametric mixture models by harnessing the connections between Poisson and Dirichlet processes. (2) The inference of partial Z-order from a video needs a method to sample from the posterior distribution of partial orders. A key challenge here is that the underlying space of partial orders is disconnected, meaning that one may not be able to make local updates without violating the combinatorial constraints for partial orders. We developed a novel sampling method to tackle this problem, which dynamically introduces virtual states as bridges to connect between different parts of the space, implicitly resulting in an ergodic Markov chain over an augmented space. With this generative model of visual scenes, many vision problems can be readily solved through inference performed on the model. Empirical experiments demonstrate that this framework yields promising results on a series of practical tasks, including video denoising and inpainting, collective motion analysis, and semantic scene understanding.en_US
dc.description.statementofresponsibilityby Dahua Lin.en_US
dc.format.extent312 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleGenerative modeling of dynamic visual scenesen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.identifier.oclc832618174en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record