Analysis of biological and chemical systems using information theoretic approximations
Author(s)
King, Bracken Matheny
DownloadFull printable version (13.66Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Biological Engineering.
Advisor
Bruce Tidor.
Terms of use
Metadata
Show full item recordAbstract
The identification and quantification of high-dimensional relationships is a major challenge in the analysis of both biological and chemical systems. To address this challenge, a variety of experimental and computational tools have been developed to generate multivariate samples from these systems. Information theory provides a general framework for the analysis of such data, but for many applications, the large sample sizes needed to reliably compute high-dimensional information theoretic statistics are not available. In this thesis we develop, validate, and apply a novel framework for approximating high-dimensional information theoretic statistics using associated terms of arbitrarily low order. For a variety of synthetic, biological, and chemical systems, we find that these low-order approximations provide good estimates of higher-order multivariate relationships, while dramatically reducing the number of samples needed to reach convergence. We apply the framework to the analysis of multiple biological systems, including a phospho-proteomic data set in which we identify a subset of phospho-peptides that is maximally informative of cellular response (migration and proliferation) across multiple conditions (varying EGF or heregulin stimulation, and HER2 expression). This subset is shown to produce statistical models with superior performance to those built with subsets of similar size. We also employ the framework to extract configurational entropies from molecular dynamics simulations of a series of small molecules, demonstrating improved convergence relative to existing methods. As these disparate applications highlight, our framework enables the use of general information theoretic phrasings even in systems where data quantities preclude direct estimation of the high-order statistics. Furthermore, because the framework provides a hierarchy of approximations of increasing order, as data collection and analysis techniques improve, the method extends to generate more accurate results, while maintaining the same underlying theory.
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Biological Engineering, 2010. Cataloged from PDF version of thesis. Includes bibliographical references (p. 115-123).
Date issued
2010Department
Massachusetts Institute of Technology. Department of Biological EngineeringPublisher
Massachusetts Institute of Technology
Keywords
Biological Engineering.