MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Uncovering the variability, regulatory roles and mutation rates of short tandem repeats

Author(s)
Willems, Thomas F. (Thomas Frederick)
Thumbnail
DownloadFull printable version (11.66Mb)
Other Contributors
Massachusetts Institute of Technology. Computational and Systems Biology Program.
Advisor
Yaniv Erlich and Manolis Kellis.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Over the past decade, the advent of next-generation DNA sequencing technologies has ushered in an exciting era of biological research. Through large-scale sequencing projects, scientists have begun to unveil the variability and function of millions of DNA mutations called single nucleotide polymorphisms. Despite this rapid growth in understanding, short tandem repeats (STRs), genomic elements consisting of a repeating pattern of 2-6 bases, have remained poorly understood. Mutating orders of magnitude more rapidly than most of the human genome, STRs have been identified as the causal variants in diseases such as Fragile X syndrome and Huntington's disease. However, in spite of their potentially profound biological consequences, STRs remain systematically understudied due to difficulties associated with obtaining accurate genotypes. To address this issue, we developed a series of bioinformatics approaches and applied them to population-scale whole-genome sequencing data sets. Using data from the 1000 Genomes Project, we performed the first genome-wide characterization of STR variability by analyzing over 700,000 loci in more than 1000 individuals. Next, we integrated these genotypes with expression data to assess the contribution of STRs to gene expression in humans, uncovering their substantial regulatory role. We then developed a state-of-the-art algorithm to genotype STRs, resulting in vastly improved accuracy and uncovering hundreds of replicable de novo mutations in a deeply sequenced trio. Lastly, we developed a novel approach to estimate mutation rates for STRs on the Y-chromosome (Y-STR), resulting in rates for hundreds of previously uncharacterized markers. Collectively, these analyses highlight the extreme variability of STRs and provide a framework for incorporating them into future studies.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Computational and Systems Biology Program, 2016.
 
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
 
Cataloged from student-submitted PDF version of thesis.
 
Includes bibliographical references (pages 163-186).
 
Date issued
2016
URI
http://hdl.handle.net/1721.1/104465
Department
Massachusetts Institute of Technology. Computational and Systems Biology Program
Publisher
Massachusetts Institute of Technology
Keywords
Computational and Systems Biology Program.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.