A general framework for genome interpretation using evolutionary signatures
Author(s)
Fujiwara, Guilherme Issao Camarinha
DownloadFull printable version (12.53Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Manolis Kellis.
Terms of use
Metadata
Show full item recordAbstract
In the post-genomic era, characterized by the availability of the genome sequence data for many species, one of the biggest challenges to be solved is to identify the functional elements in our genome: the small subsequences containing units of biological function. Work has been done to computationally identify specific functional elements such as protein coding genes [11], RNA genes [17], microRNA genes [16], regulatory motifs and individual binding sites for transcription factors and microRNAs [10]. This work has benefited from the use of evolutionary signatures obtained by observing the genomics changes across the sequence data of related species. We propose in this work a general framework to perform functional element identification using evolutionary signatures. We first design several metrics of evolutionary signatures that are meant to capture different patterns of evolution expected from elements that have different biological function as well as novel patterns capturing diverse properties of evolutionary changes. We then compute these metrics for each of the elements in the human genome that are conserved across mammals and other vertebrate species in order to identify classes of functional elements. Based on these metrics, we first perform classification of specific known types of functional elements, such as protein coding sequences, RNA coding sequences and CpG-rich promoters. With success in this step, we go one step further and establish an unsupervised clustering framework for conserved elements based on these metrics. With this approach, we obtain clusters of known and unknown classes of functional elements. We find that some of these clusters correspond to known funtional elements, while others are depleted for known functions, while showing strong evidence of transcription and epigenetic modifications, suggesting these may correspond to novel classes of functional clusters. This illustrates the power of this method in identifying elements of known classes of functionality and to discover elements of novel classes of functionality.
Description
Includes bibliographical references (p. 55-57). Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
Date issued
2008Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.