Integrating genomic conservation data with motif discovery
Author(s)
Danford, Timothy W. (Timothy William), 1979-
DownloadFull printable version (5.569Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
David K. Gifford.
Terms of use
Metadata
Show full item recordAbstract
We formalize a probabilistic model of inter-species sequence conservation for motif discovery, and demonstrate that adding large-scale genomic conservation data to an existing motif discovery procedure improves the quality of that procedure's results. Existing motif discovery algorithms reveal binding motifs that are statistically over-represented in small sets of promoter regions. To the extent that binding motifs form a reliable part of a cell's regulatory apparatus, and that apparatus is preserved across closely related species, these binding motifs should also be conserved in the corresponding genomes. Previous studies have tried to assess levels of conservation in genomic fragments of several yeast species. Our approach computes the conditional probability of inter-species sequences, and uses this probability measure to maximize the likelihood of the data from different species with a motif model.
Description
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004. Includes bibliographical references (leaves 94-99).
Date issued
2004Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.