MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Unveiling Phenotype–Genotype Interplay with Deep Learning Foundation Models for scRNA-seq: A Quantitative Perspective

Author(s)
Thadawasin, Pakaphol
Thumbnail
DownloadThesis PDF (4.370Mb)
Advisor
Edelman, Elazer R.
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Foundation models have emerged as powerful tools for analyzing single-cell RNA sequencing (scRNA-seq) data, leveraging large-scale pretraining to capture complex gene expression patterns. However, a comprehensive quantitative framework for understanding the interplay between phenotypes and genotypes remains underdeveloped. Such a framework is critical not only for validating model performance but also for uncovering previously unrecognized biological relationships. In this work, we present both traditional and deep learning-based quantitative analysis pipelines for PolyGene [1], a transformer-based scRNA-seq foundation model, aimed at disentangling the complex phenotype–genotype relationship. First, we implement a top-k classification and entropy evaluation pipeline to serve as a primary validation framework. Our results demonstrate that the pretrained PolyGene [1] is robust in top-k classification metrics and provides meaningful insights into the entropy landscape of human cells across different life stages. Second, we propose a novel deep learning gradientbased gene selection method designed to address limitations in traditional feature selection approaches, such as poor scalability and sensitivity to heterogeneity in high-dimensional data. Through empirical evaluations on benchmark scRNA-seq datasets, we show that our method enhances model interpretability and improves downstream performance, offering a more scalable and biologically relevant alternative to existing techniques. Overall, this work introduces a set of quantitative analysis tools that fill a critical gap in evaluating and interpreting scRNA-seq foundation models, contributing to a deeper understanding of the genotype–phenotype interplay through modern deep learning techniques.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/162920
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.