Machine Learning Methods for Learning Genetic Dependencies
Author(s)
Cai, Cathy
DownloadThesis PDF (4.800Mb)
Advisor
Uhler, Caroline
Terms of use
Metadata
Show full item recordAbstract
Synthetic lethality refers to a genetic interaction where the simultaneous perturbation of gene pairs leads to cell death. Synthetically lethal gene pairs (SL pairs) provide a potential avenue for selectively targeting cancer cells based on genetic vulnerabilities. The rise of large-scale gene perturbation screens such as the Cancer Dependency Map (DepMap) offers the opportunity to identify SL pairs automatically using machine learning. We build on a recently developed class of feature learning kernel machines known as Recursive Feature Machines (RFMs) to develop a pipeline for identifying SL pairs based on CRISPR viability data from DepMap. In particular, we first train RFMs to predict viability scores for a given CRISPRgene knockout from cell line embeddings consisting of gene expression and mutation features. After training, RFMs use a statistical operator known as average gradient outer product to provide weights for each feature indicating the importance of each feature in predicting cellular viability. We subsequently apply correlation-based filters to re-weight RFMfeature importances and identify those features that are most indicative of low cellular viability. Our resulting pipeline is computationally efficient, taking under 3 minutes for analyzing all 17,453 knockouts from DepMap for candidate SL pairs. We show that our pipeline more accurately recovers experimentally verified SL pairs than prior approaches. Moreover, our pipeline finds new candidate SL pairs, thereby opening novel avenues for identifying genetic vulnerabilities in cancer.
Date issued
2024-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology