MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Interpreting the role of non-coding genetic variation in human disease

Author(s)
Sarkar, Abhishek Kulshreshtha
Thumbnail
DownloadFull printable version (12.04Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Manolis Kellis.
Terms of use
MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
One of the fundamental goals of human genetics is to identify the genetic causes of human disease to ultimately design novel therapeutics. However, two challenges have become readily apparent. First, the majority of genomic regions associated with disease do not implicate protein-altering variants but might instead alter gene regulation, making interpretation and validation more difficult. Second, the genomic regions associated with disease explain a fraction of the variance of associated phenotypes, suggesting human diseases are highly polygenic and that many additional regions remain to be discovered and characterized. Here, we address these challenges by using functional annotation of the human genome spanning diverse data types: epigenomic profiles, gene regulatory circuitry, and biological pathways. We first develop a method to simultaneously select relevant genomic regions not yet associated with disease as well as select relevant functional annotations enriched in those regions. We show that both tissue-specific and shared regulatory regions are enriched for disease associations across eight common diseases. We then characterize specific genetic variants in the selected regions, the gene regulatory elements they reside in, the cellular contexts in which those elements are active, their upstream regulators, their downstream target genes, and the biological pathways they disrupt across eight common diseases. We show that disease associations are additionally enriched in regulatory motifs of relevant transcription factors and in relevant biological pathways. We finally investigate why predicted regulatory elements are enriched in disease-associated variants by framing the problem as Bayesian inference of hyperparameters in a structured sparse regression model. We propose an active sampling method to efficiently explore the hyperparameter space and avoid exponential scaling in the dimension of the hyperparameters. We show in simulation that our method can distinguish between possible explanations of the observed enrichments, and we characterize potential biases in the estimates. Together, our results can help guide the development of new models of disease and gene regulation and discovery of biologically meaningful, but currently undetectable regulatory loci underlying a number of common diseases.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
 
Cataloged from PDF version of thesis.
 
Includes bibliographical references (pages 101-107).
 
Date issued
2017
URI
http://hdl.handle.net/1721.1/112026
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.