Genomic variety estimation with Bayesian nonparametric hierarchies
Author(s)
Masoero, Lorenzo.
Download1102050333-MIT.pdf (5.065Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Tamara Broderick.
Terms of use
Metadata
Show full item recordAbstract
The recent availability of large genomic studies, with tens of thousands of observations, opens up the intriguing possibility to investigate and understand the effect of rare genetic variants in biological human evolution as well as their impact in the developement of rare diseases. To do so, it is imperative to develop a statistical framework to assess what fraction of the overall variation present in human genome is not yet captured by available datasets. In this thesis we introduce a novel and rigorous methodology to estimate how many new variants are yet to be observed in the context of genomic projects using a nonparametric Bayesian hierarchical approach, which allows to perform prediction tasks which jointly handle multiple subpopulations at the same time. Moreover, our method performs well on extremely small as well as very large datasets, a desirable property given the variability in size of available datasets. As a byproduct of the Bayesian formulation, our estimation procedure also naturally provides uncertainty quantification of the estimates produced.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from PDF version of thesis. Includes bibliographical references (pages 75-83).
Date issued
2019Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.