Show simple item record

dc.contributor.advisorEdelman, Elazer R.
dc.contributor.authorThadawasin, Pakaphol
dc.date.accessioned2025-10-06T17:34:45Z
dc.date.available2025-10-06T17:34:45Z
dc.date.issued2025-05
dc.date.submitted2025-06-23T14:03:55.271Z
dc.identifier.urihttps://hdl.handle.net/1721.1/162920
dc.description.abstractFoundation models have emerged as powerful tools for analyzing single-cell RNA sequencing (scRNA-seq) data, leveraging large-scale pretraining to capture complex gene expression patterns. However, a comprehensive quantitative framework for understanding the interplay between phenotypes and genotypes remains underdeveloped. Such a framework is critical not only for validating model performance but also for uncovering previously unrecognized biological relationships. In this work, we present both traditional and deep learning-based quantitative analysis pipelines for PolyGene [1], a transformer-based scRNA-seq foundation model, aimed at disentangling the complex phenotype–genotype relationship. First, we implement a top-k classification and entropy evaluation pipeline to serve as a primary validation framework. Our results demonstrate that the pretrained PolyGene [1] is robust in top-k classification metrics and provides meaningful insights into the entropy landscape of human cells across different life stages. Second, we propose a novel deep learning gradientbased gene selection method designed to address limitations in traditional feature selection approaches, such as poor scalability and sensitivity to heterogeneity in high-dimensional data. Through empirical evaluations on benchmark scRNA-seq datasets, we show that our method enhances model interpretability and improves downstream performance, offering a more scalable and biologically relevant alternative to existing techniques. Overall, this work introduces a set of quantitative analysis tools that fill a critical gap in evaluating and interpreting scRNA-seq foundation models, contributing to a deeper understanding of the genotype–phenotype interplay through modern deep learning techniques.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleUnveiling Phenotype–Genotype Interplay with Deep Learning Foundation Models for scRNA-seq: A Quantitative Perspective
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record