Few Shot Learning for Rare Disease Diagnosis

Alsentzer, Emily

Author(s)

Alsentzer, Emily

DownloadThesis PDF (22.80Mb)

Advisor

Kohane, Isaac

Szolovits, Peter

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Rare diseases affect 300-400 million people worldwide, yet each disease has very low prevalence, affecting no more than 50 per 100,000 individuals. Many patients with rare genetic conditions remain undiagnosed due to clinicians' lack of experience with the individual diseases and the considerable heterogeneity of clinical presentations. Machine-assisted diagnosis offers the opportunity to shorten the diagnostic delays for rare disease patients. Recent advances in deep learning have considerably improved the accuracy of medical diagnosis. However, much of the success thus far is contingent on the availability of large annotated datasets containing thousands of examples per condition for training machine learning models. Machine-assisted diagnosis of rare diseases presents unique challenges; approaches must learn from limited data and extrapolate beyond training distribution to novel genetic conditions. The goal of this thesis is to develop few shot learning methods that can overcome the data limitations of deep learning approaches to diagnose patients with rare genetic conditions. Motivated by the need to infuse external knowledge into models, we first develop novel graph neural network methods for subgraph representation learning that encode how subgraphs (e.g., a set of patient phenotypes) relate to a larger knowledge graph. To address the issue of data scarcity, we next develop a framework for simulating realistic rare disease patients with novel genetic conditions and demonstrate how these simulated patients are similar to real rare disease patients. Finally, we leverage these advances to develop \name, a few shot method for diagnosis of patients with rare genetic conditions in the Undiagnosed Diseases Network. SHEPHERD reasons over biomedical knowledge via geometric deep learning to learn generalizable representations of rare disease patients. \name can operate at multiple facets throughout the rare disease diagnosis process: performing causal gene discovery, retrieving “patients-like-me" with the same causal gene or disease, and providing interpretable characterizations of novel disease presentations. Our work illustrates the potential for deep learning methods to rapidly accelerate molecular diagnosis and shorten the diagnostic odyssey for rare disease patients.

Date issued

2022-09

URI

https://hdl.handle.net/1721.1/147431

Department

Harvard-MIT Program in Health Sciences and Technology

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses