Interpreting the role of non-coding genetic variation in human disease
Author(s)Sarkar, Abhishek Kulshreshtha
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
One of the fundamental goals of human genetics is to identify the genetic causes of human disease to ultimately design novel therapeutics. However, two challenges have become readily apparent. First, the majority of genomic regions associated with disease do not implicate protein-altering variants but might instead alter gene regulation, making interpretation and validation more difficult. Second, the genomic regions associated with disease explain a fraction of the variance of associated phenotypes, suggesting human diseases are highly polygenic and that many additional regions remain to be discovered and characterized. Here, we address these challenges by using functional annotation of the human genome spanning diverse data types: epigenomic profiles, gene regulatory circuitry, and biological pathways. We first develop a method to simultaneously select relevant genomic regions not yet associated with disease as well as select relevant functional annotations enriched in those regions. We show that both tissue-specific and shared regulatory regions are enriched for disease associations across eight common diseases. We then characterize specific genetic variants in the selected regions, the gene regulatory elements they reside in, the cellular contexts in which those elements are active, their upstream regulators, their downstream target genes, and the biological pathways they disrupt across eight common diseases. We show that disease associations are additionally enriched in regulatory motifs of relevant transcription factors and in relevant biological pathways. We finally investigate why predicted regulatory elements are enriched in disease-associated variants by framing the problem as Bayesian inference of hyperparameters in a structured sparse regression model. We propose an active sampling method to efficiently explore the hyperparameter space and avoid exponential scaling in the dimension of the hyperparameters. We show in simulation that our method can distinguish between possible explanations of the observed enrichments, and we characterize potential biases in the estimates. Together, our results can help guide the development of new models of disease and gene regulation and discovery of biologically meaningful, but currently undetectable regulatory loci underlying a number of common diseases.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.Cataloged from PDF version of thesis.Includes bibliographical references (pages 101-107).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.