Principled Methods and Models for Deep Learning Based Functional Genomics
Author(s)
Krismer, Konstantin
DownloadThesis PDF (28.47Mb)
Advisor
Gifford, David K.
Terms of use
Metadata
Show full item recordAbstract
Many advances in functional genomics and in biology more broadly can be attributed to the rise of massively parallel sequencing technology and its derivatives. As the volume of sequencing and other high-throughput experimental data increases exponentially, so does the need for computational methods to analyze and condense these vast amounts of data, and to help explain the underlying phenomena. In this thesis, I describe five projects that introduce novel techniques and methods in functional genomics.
The first project introduces a simulation-based framework to investigate neural network architectures that are trained on biological sequence data, as is common in functional genomics. The second project describes a two-pronged approach to study the determinants of cell type-specific chromatin accessibility, with an ensemble of neural networks trained on DNase-seq data to predict chromatin accessibility, and MIAA, the multiplexed integrated accessibility assay, to validate, experimentally, these in silico predictions. The third project presents a method to identify long-range genomic interactions from ChIA-PET and HiChIP data. Enabled by this work, the fourth project aims to provide a means to identify reproducible long-range genomic interactions. We continue the analysis of long-range interactions in the fifth project by performing co-enrichment analysis of transcription factor sequence motifs.
Collectively, these methods provide new approaches to a range of problems in functional genomics, from finding appropriate neural network architectures for sequence-based prediction tasks to uncovering patterns in long-range genomic interactions.
Date issued
2021-09Department
Massachusetts Institute of Technology. Department of Biological EngineeringPublisher
Massachusetts Institute of Technology