Uncovering the variability, regulatory roles and mutation rates of short tandem repeats
Author(s)
Willems, Thomas F. (Thomas Frederick)
DownloadFull printable version (11.66Mb)
Other Contributors
Massachusetts Institute of Technology. Computational and Systems Biology Program.
Advisor
Yaniv Erlich and Manolis Kellis.
Terms of use
Metadata
Show full item recordAbstract
Over the past decade, the advent of next-generation DNA sequencing technologies has ushered in an exciting era of biological research. Through large-scale sequencing projects, scientists have begun to unveil the variability and function of millions of DNA mutations called single nucleotide polymorphisms. Despite this rapid growth in understanding, short tandem repeats (STRs), genomic elements consisting of a repeating pattern of 2-6 bases, have remained poorly understood. Mutating orders of magnitude more rapidly than most of the human genome, STRs have been identified as the causal variants in diseases such as Fragile X syndrome and Huntington's disease. However, in spite of their potentially profound biological consequences, STRs remain systematically understudied due to difficulties associated with obtaining accurate genotypes. To address this issue, we developed a series of bioinformatics approaches and applied them to population-scale whole-genome sequencing data sets. Using data from the 1000 Genomes Project, we performed the first genome-wide characterization of STR variability by analyzing over 700,000 loci in more than 1000 individuals. Next, we integrated these genotypes with expression data to assess the contribution of STRs to gene expression in humans, uncovering their substantial regulatory role. We then developed a state-of-the-art algorithm to genotype STRs, resulting in vastly improved accuracy and uncovering hundreds of replicable de novo mutations in a deeply sequenced trio. Lastly, we developed a novel approach to estimate mutation rates for STRs on the Y-chromosome (Y-STR), resulting in rates for hundreds of previously uncharacterized markers. Collectively, these analyses highlight the extreme variability of STRs and provide a framework for incorporating them into future studies.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Computational and Systems Biology Program, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 163-186).
Date issued
2016Department
Massachusetts Institute of Technology. Computational and Systems Biology ProgramPublisher
Massachusetts Institute of Technology
Keywords
Computational and Systems Biology Program.