Comparative gene identification in mammalian, fly, and fungal genomes
Author(s)Lin, Michael F. (Michael Fong-Jay)
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
MetadataShow full item record
An important step in genome interpretation is the accurate identification of protein-coding genes. One approach to gene identification is comparative analysis of the genomes of several related species, to find genes that have been conserved by natural selection over millions of years of evolution. I develop general computational methods that combine statistical analysis of genome sequence alignments with classification algorithms in order to detect the distinctive signatures of protein-coding DNA sequence evolution. I implement these methods as a software system, which I then use to identify previously unknown genes, and cast doubt on some existing gene annotations, in the genomes of the fungi Saccharomyces cerevisiae and Candida albicans, the fruit fly Drosophila melanogaster, and the human. These methods perform competitively with the best existing de novo gene identification systems, and are practically applicable to the goal of improving existing gene annotations through comparative genomics.
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (leaves 55-56).
DepartmentMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.