Theory and Applications of Matrix Completion in Genomics Datasets

Stefanakis, George

Author(s)

Stefanakis, George

DownloadThesis PDF (2.849Mb)

Advisor

Uhler, Caroline

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

The advent of rapid and efficient biological screening and sequencing technologies has enabled high-throughput data collection, opening the door to improvements in drug discovery, disease identification, and personalized medicine, among others. The size and scope of such datasets is unprecedented, and their increased availability over the past decade, in conjunction with rapid advancements in statistical inference and machine learning, has paved the way for an explosion in research. Still, many problems in this space are yet-unexplored or still in their infancy, either due to data availability or lack of computationally efficient or high-accuracy methods for modeling and prediction. In this work, we develop theory and demonstrate empirical results for use of the novel Neural Tangent Kernel (NTK) in matrix completion. We derive the functional form of the NTK for a single-hidden-layer, infinite-width neural network with ReLU activation, and develop a framework applying the NTK to matrix completion. We explore a specific application of this framework, using the Connectivity Map dataset of gene expression data for various cells and perturbations, demonstrating competitive results as compared to other methods. Additionally, we analyze our contributions through the auxiliary lens of performance engineering and develop concrete algorithms for accurate, performant, and intuitive biological imputation.

Date issued

2022-05

URI

https://hdl.handle.net/1721.1/144547

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses