Algorithms for Reconstructing Biological History from Genomic Data
Author(s)
Kim, Younhun
DownloadThesis PDF (7.747Mb)
Advisor
Berger, Bonnie
Terms of use
Metadata
Show full item recordAbstract
In this thesis, we study several problems related to computational biology surrounding a central theme: inferring temporally-spaced events using noisy measurements. The first half studies two theoretical problems for explaining the history of human populations at different scales. First, we present sample complexity results for learning population structures given pairwise coalescence data. The second involves pedigree reconstruction, in which we prove that there is a sample-efficient algorithm for reconstructing a “family tree” given a population-wide collection of genomic information.
The second half of the thesis concerns models for the microbiome and practical algorithms that emphasize scalability and interpretability. We present work on strain tracking, in which one is asked to reconstruct a time-series profile of bacterial strain ratios from shotgun-sequenced reads. We state an algorithm designed to scale on large data, discuss some real-world considerations that makes the problem particularly challenging, and present empirical results. Last but not least, we present collaborative work on dynamical systems modeling of the microbiome, in which we discuss how one can learn a large, yet interpretable, Lotka-Volterra model from time-series measurements of the microbiome.
Date issued
2023-02Department
Massachusetts Institute of Technology. Department of MathematicsPublisher
Massachusetts Institute of Technology