Few Shot Learning for Rare Disease Diagnosis
Author(s)
Alsentzer, Emily
DownloadThesis PDF (22.80Mb)
Advisor
Kohane, Isaac
Szolovits, Peter
Terms of use
Metadata
Show full item recordAbstract
Rare diseases affect 300-400 million people worldwide, yet each disease has very low prevalence, affecting no more than 50 per 100,000 individuals. Many patients with rare genetic conditions remain undiagnosed due to clinicians' lack of experience with the individual diseases and the considerable heterogeneity of clinical presentations. Machine-assisted diagnosis offers the opportunity to shorten the diagnostic delays for rare disease patients. Recent advances in deep learning have considerably improved the accuracy of medical diagnosis. However, much of the success thus far is contingent on the availability of large annotated datasets containing thousands of examples per condition for training machine learning models. Machine-assisted diagnosis of rare diseases presents unique challenges; approaches must learn from limited data and extrapolate beyond training distribution to novel genetic conditions.
The goal of this thesis is to develop few shot learning methods that can overcome the data limitations of deep learning approaches to diagnose patients with rare genetic conditions. Motivated by the need to infuse external knowledge into models, we first develop novel graph neural network methods for subgraph representation learning that encode how subgraphs (e.g., a set of patient phenotypes) relate to a larger knowledge graph. To address the issue of data scarcity, we next develop a framework for simulating realistic rare disease patients with novel genetic conditions and demonstrate how these simulated patients are similar to real rare disease patients. Finally, we leverage these advances to develop \name, a few shot method for diagnosis of patients with rare genetic conditions in the Undiagnosed Diseases Network. SHEPHERD reasons over biomedical knowledge via geometric deep learning to learn generalizable representations of rare disease patients. \name can operate at multiple facets throughout the rare disease diagnosis process: performing causal gene discovery, retrieving “patients-like-me" with the same causal gene or disease, and providing interpretable characterizations of novel disease presentations. Our work illustrates the potential for deep learning methods to rapidly accelerate molecular diagnosis and shorten the diagnostic odyssey for rare disease patients.
Date issued
2022-09Department
Harvard-MIT Program in Health Sciences and TechnologyPublisher
Massachusetts Institute of Technology