Show simple item record

dc.contributor.advisorJames Galagan and David DeCaprio.en_US
dc.contributor.authorDoherty, Matthew Ken_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2008-05-19T16:04:55Z
dc.date.available2008-05-19T16:04:55Z
dc.date.copyright2007en_US
dc.date.issued2007en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/41646
dc.descriptionThesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.en_US
dc.descriptionIncludes bibliographical references (p. 75-77).en_US
dc.description.abstractThe accurate annotation of an organism's protein-coding genes is crucial for subsequent genomic analysis. The rapid advance of sequencing technology has created a gap between genomic sequences and their annotations. Automated annotation methods are needed to bridge this gap, but existing solutions based on hidden Markov models cannot easily incorporate diverse evidence to make more accurate predictions. In this thesis, I built upon the semi-Markov conditional random field framework created by DeCaprio et al. to predict protein-coding genes in DNA sequences. Several novel extensions were designed and implemented, including a 29-state model with both semi-Markov and Markov states, an N-best Viterbi inference algorithm, several classes of discriminative feature functions that incorporate diverse evidence, and parallelization of the training and inference algorithms. The extensions were tested on the genomes of Phytophthora infestans, Culex pipiens, and Homo sapiens. The gene predictions were analyzed and the benefits of discriminative methods were explored.en_US
dc.description.statementofresponsibilityby Matthew K. Doherty.en_US
dc.format.extent77 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleGene prediction with conditional random fieldsen_US
dc.title.alternativeApplications of conditional random fields in bioinformaticsen_US
dc.typeThesisen_US
dc.description.degreeM.Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc219708684en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record