Probabilistic framework for genome-wide phylogeny and ortholog determination
Author(s)Rasmussen, Matthew D. (Matthew David)
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
MetadataShow full item record
Comparative genomics of multiple related species has emerged as a powerful tool for genome signal discovery. To that end, dozens of mammalian, fly, and fungal genomes have been fully sequenced. Making use of these genomes requires rigorous computational methods for determining the evolutionary history of every gene and region. In particular, comparative analysis requires the ability to distinguish between orthologous and paralogous regions. Current approaches to ortholog identification work adequately for pairs of species but are ineffective for multiple complete genomes. This thesis presents a new phylogenetic reconstruction method, SINDIR, that is designed specifically for genome-wide orthology determination. Unlike any other method, SINDIR exploits the known evolutionary history of a set of species to infer the history of their genes. This is done by learning a probabilistic model of evolution from a trusted set of unambiguous orthologs. Given this model, SINDIR can find the maximum likelihood phylogenetic tree for any set of the genes. In a novel technique, synteny maps are used to train and evaluate the evolutionary model on both simulated and real sequence data. SINDIR avoids errors commonly committed by current methods and achieves a significantly improved accuracy of orthology determination.
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (leaves 63-65).
DepartmentMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.