Leveraging Single-Cell ATAC-Seq for Genomic Language Models and Multimodal Foundation Models
Author(s)
Kim, Dong Young
DownloadThesis PDF (3.132Mb)
Advisor
Zamparo, Lee
Hrvatin, Siniša
Terms of use
Metadata
Show full item recordAbstract
Single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) has emerged as a powerful tool for profiling chromatin accessibility at single-cell resolution. By capturing epigenomic landscapes, scATAC-seq provides critical insights into the regulatory elements that govern gene expression. However, the sparsity of scATAC-seq data, resulting from its low sequencing depth relative to the genome’s potential complexity, poses significant challenges for effective and accurate modeling. To advance the utility of scATAC-seq in modern biology, we explore its integration into deep learning frameworks through two innovative applications. First, we demonstrate how incorporating scATAC data enhances the performance of existing genomic language models by providing complementary context about chromatin accessibility. Specifically, we introduce scATAC to improve SegmentNT, a DNA segmentation model that leverages the Nucleotide Transformer (NT) to predict 14 types of genomic and regulatory elements from DNA sequences up to 30kb at single-nucleotide resolution. Second, we introduce a novel multimodal foundation model that extends existing scRNA-seq foundation models by integrating scATAC-seq data. This model captures crossmodal relationships between gene expression and chromatin accessibility, establishing a unified framework that can be fine-tuned for diverse downstream tasks, including cell type classification and cross-modal imputation. Our work highlights the potential of incorporating scATAC-seq data into existing genomics deep learning strategies, providing a framework for integrating regulatory DNA analysis more seamlessly into genomic modeling.
Date issued
2025-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology