Show simple item record

dc.contributor.advisorYaniv Erlich and Manolis Kellis.en_US
dc.contributor.authorWillems, Thomas F. (Thomas Frederick)en_US
dc.contributor.otherMassachusetts Institute of Technology. Computational and Systems Biology Program.en_US
dc.date.accessioned2016-09-30T18:24:56Z
dc.date.available2016-09-30T18:24:56Z
dc.date.copyright2016en_US
dc.date.issued2016en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/104465
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Computational and Systems Biology Program, 2016.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionCataloged from student-submitted PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 163-186).en_US
dc.description.abstractOver the past decade, the advent of next-generation DNA sequencing technologies has ushered in an exciting era of biological research. Through large-scale sequencing projects, scientists have begun to unveil the variability and function of millions of DNA mutations called single nucleotide polymorphisms. Despite this rapid growth in understanding, short tandem repeats (STRs), genomic elements consisting of a repeating pattern of 2-6 bases, have remained poorly understood. Mutating orders of magnitude more rapidly than most of the human genome, STRs have been identified as the causal variants in diseases such as Fragile X syndrome and Huntington's disease. However, in spite of their potentially profound biological consequences, STRs remain systematically understudied due to difficulties associated with obtaining accurate genotypes. To address this issue, we developed a series of bioinformatics approaches and applied them to population-scale whole-genome sequencing data sets. Using data from the 1000 Genomes Project, we performed the first genome-wide characterization of STR variability by analyzing over 700,000 loci in more than 1000 individuals. Next, we integrated these genotypes with expression data to assess the contribution of STRs to gene expression in humans, uncovering their substantial regulatory role. We then developed a state-of-the-art algorithm to genotype STRs, resulting in vastly improved accuracy and uncovering hundreds of replicable de novo mutations in a deeply sequenced trio. Lastly, we developed a novel approach to estimate mutation rates for STRs on the Y-chromosome (Y-STR), resulting in rates for hundreds of previously uncharacterized markers. Collectively, these analyses highlight the extreme variability of STRs and provide a framework for incorporating them into future studies.en_US
dc.description.statementofresponsibilityby Thomas F. Willems.en_US
dc.format.extent186 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectComputational and Systems Biology Program.en_US
dc.titleUncovering the variability, regulatory roles and mutation rates of short tandem repeatsen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computational and Systems Biology Program
dc.identifier.oclc958686259en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record