Advanced Search
DSpace@MIT

Use of machine learning techniques for SNP based prediction of ancestry

Research and Teaching Output of the MIT Community

Show simple item record

dc.contributor.advisor Isaac Kohane. en_US
dc.contributor.author Allocco, Dominic en_US
dc.contributor.other Harvard University--MIT Division of Health Sciences and Technology. en_US
dc.date.accessioned 2007-01-10T16:37:08Z
dc.date.available 2007-01-10T16:37:08Z
dc.date.copyright 2006 en_US
dc.date.issued 2006 en_US
dc.identifier.uri http://hdl.handle.net/1721.1/35550
dc.description Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2006. en_US
dc.description Includes bibliographical references (leaves 29-30). en_US
dc.description.abstract Some have argued that the genetic differences between continentally defined groups are relatively small and unlikely to have biomedical significance. In this study, the extent of variation between continentally defined groups was evaluated. Small numbers of randomly selected single nucleotide polymorphisms from the International HapMap Project were used to train classifiers for prediction of ancestral continent of origin. Predictive accuracy was then tested on independent data sets. A high degree of genetic similarity implies that groups will be difficult to distinguish, especially when only a limited amount of genetic information is used. It is shown that the genetic differences between continentally defined groups are sufficiently large that one can accurately predict ancestral continent of origin using only a minute, randomly selected fraction of the genetic variation present in the human genome. Genotype data from only 50 random single nucleotide polymorphisms can be used to predict ancestral continent of origin in the primary test data set with an average accuracy of 95%. en_US
dc.description.abstract (cont.) Single nucleotide polymorphisms were also characterized as being in introns, coding exons, regulatory regions and regions coding for untranslated mRNA and classifiers constructed using only single nucleotide polymorphisms from a specific category. Predictive accuracy was similar across all of the classifiers created in this manner. Single nucleotide polymorphisms useful for prediction of ancestral continent of origin are common and distributed relatively evenly throughout the genome. These findings demonstrate the extent of variation between continentally defined groups and argue strongly against the contention that genetic differences between groups are too small to have biomedical significance. en_US
dc.description.statementofresponsibility by Dominic J. Allocco. en_US
dc.format.extent 31 leaves en_US
dc.format.extent 1346044 bytes
dc.format.extent 1344923 bytes
dc.format.mimetype application/pdf
dc.format.mimetype application/pdf
dc.language.iso eng en_US
dc.publisher Massachusetts Institute of Technology en_US
dc.rights M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. en_US
dc.rights.uri http://dspace.mit.edu/handle/1721.1/7582
dc.subject Harvard University--MIT Division of Health Sciences and Technology. en_US
dc.title Use of machine learning techniques for SNP based prediction of ancestry en_US
dc.type Thesis en_US
dc.description.degree S.M. en_US
dc.contributor.department Harvard University--MIT Division of Health Sciences and Technology. en_US
dc.identifier.oclc 73726748 en_US


Files in this item

Name Size Format Description
73726748.pdf 1.283Mb PDF Preview, non-printable (open to all)
73726748-MIT.pdf 1.282Mb PDF Full printable version (MIT only)

This item appears in the following Collection(s)

Show simple item record

MIT-Mirage