Uncertainty and Generality of Transfer Learning Models in Predicting Signaling History
Author(s)
Lu, Claire
DownloadThesis PDF (3.866Mb)
Advisor
Li, Pulin
Terms of use
Metadata
Show full item recordAbstract
Proper cell-cell communication is essential for multicellular development, from embryogenesis to stem cell differentiation. To map these networks, we developed IRIS (Intracellular Response to Infer Signaling state), a semi-supervised deep learning method that fits conditional variational autoencoders (CVAE) to single-cell RNA sequencing (scRNA-seq) data. IRIS is able to annotate cellular signaling states of individual cells using only their gene expression. Currently, IRIS has been validated in developmental contexts, including gastrulation, early endoderm organogenesis, and mesoderm lineages in mouse embryos. However, its predictions often show extremely high or extremely low confidence, suggesting a need for methods to prevent overconfidence and better account for uncertainty. To generalize IRIS to broader cell-cell communication problems, we combined engineering and experimental approaches, integrating uncertainty quantification techniques with new biological datasets. We implemented three approaches for estimating uncertainty in IRIS predictions: stochastic sampling, Monte Carlo dropout, and ensemble prediction. These approaches were evaluated on two new endoderm and mesenchyme combinatorial perturbation screens. Across all methods, uncertainty values reliably reflected the varying difficulty of predicting different signaling pathways, driven by both biological complexity and dataset representation. Moreover, higher uncertainty was consistently associated with lower prediction accuracy, confirming uncertainty as a useful proxy for model confidence. All three methods identified similar high-uncertainty cell populations, supporting their consistency and validity. By incorporating uncertainty quantification into IRIS, we provide more robust and interpretable predictions that can guide future experiments and enhance the model’s applicability across diverse biological contexts.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology