Statistical physics and biological information : hydrophobicity patterns in protein design and differential motif finding in DNA
Author(s)
Yahyanejad, Mehdi, 1975-
DownloadFull printable version (4.529Mb)
Alternative title
Hydrophobicity patterns in protein design and differential motif finding in DNA
Other Contributors
Massachusetts Institute of Technology. Dept. of Physics.
Advisor
Christopher B. Burge and Mehran Kardar.
Terms of use
Metadata
Show full item recordAbstract
(cont.) is dictated by the solvent accessibility of structures. The distinct intrinsic tendencies of sequence and structure profiles are most pronounced at long periods, where sequence hydrophobicity fluctuates less, while solvent accessibility fluctuates more than average. Correlations between the two profiles can be interpreted as the Boltzmann weight of the solvation energy at room temperature. Chapter 4 shows that correlations in solvent accessibility along protein structures play a key role in the designability phenomenon, for both lattice and natural proteins. Without such correlations, as predicted by the Random Energy Model (REM), all structures will have almost equal values of designability. By using a toy, Ising-based model, we show that changing the correlations moves between a regime with no designability and a regime exhibiting the designability phenomenon, where a few highly designable structures emerge. Understanding how gene expression is regulated is one of the main goals of molecular cell biology. To reach this goal, the recognition and identification of DNA motifs--short patterns in biological sequences--is essential. Common examples of motifs include transcription factor binding sites in promoter regions of co-regulated genes and exonic and intronic splicing enhancers ... In the past decade, a large amount of biological data has been generated, enabling new quantitative approaches in biology. In this thesis, we focus on two biological questions by using techniques from statistical physics: hydrophobicity patterns in proteins and their impact on the designability of protein structures and regulatory motif finding in DNA sequences. Proteins fold into specific structures to perform their functions. Hydrophobicity is the main force of folding; protein sequences try to lower the ground state energy of the folded structure by burying hydrophobic monomers in the core. This results in patterns, or correlations, in the hydrophobic profiles of proteins. In this thesis, we study the designability phenomena: the vast majority of proteins adopt only a small number of distinct folded structures. In Chapter 2, we use principal component analysis to characterize the distribution of solvent accessibility profiles in an appropriate high-dimensional vector space and show that the distribution can be approximated with a Gaussian form. We also show that structures with solvent accessibility profiles dissimilar to the rest are more likely to be highly designable, offering an alternative to existing, computationally-intensive methods for identifying highly-designable structures. In Chapter 3, we extend our method to natural proteins. We use Fourier analysis to study the solvent accessibility and hydrophobicity profiles of natural proteins and show that their distribution can be approximated by a multi-variate Gaussian. The method allows us to separate the intrinsic tendencies of sequence and structure profiles from the interactions that correlate them; we conclude that the alpha-helix periodicity in sequence hydrophobicity
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Physics, 2004. Includes bibliographical references (p. 115-124).
Date issued
2004Department
Massachusetts Institute of Technology. Department of PhysicsPublisher
Massachusetts Institute of Technology
Keywords
Physics.