MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Leveraging Single-Cell ATAC-Seq for Genomic Language Models and Multimodal Foundation Models

Author(s)
Kim, Dong Young
Thumbnail
DownloadThesis PDF (3.132Mb)
Advisor
Zamparo, Lee
Hrvatin, Siniša
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
Single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) has emerged as a powerful tool for profiling chromatin accessibility at single-cell resolution. By capturing epigenomic landscapes, scATAC-seq provides critical insights into the regulatory elements that govern gene expression. However, the sparsity of scATAC-seq data, resulting from its low sequencing depth relative to the genome’s potential complexity, poses significant challenges for effective and accurate modeling. To advance the utility of scATAC-seq in modern biology, we explore its integration into deep learning frameworks through two innovative applications. First, we demonstrate how incorporating scATAC data enhances the performance of existing genomic language models by providing complementary context about chromatin accessibility. Specifically, we introduce scATAC to improve SegmentNT, a DNA segmentation model that leverages the Nucleotide Transformer (NT) to predict 14 types of genomic and regulatory elements from DNA sequences up to 30kb at single-nucleotide resolution. Second, we introduce a novel multimodal foundation model that extends existing scRNA-seq foundation models by integrating scATAC-seq data. This model captures crossmodal relationships between gene expression and chromatin accessibility, establishing a unified framework that can be fine-tuned for diverse downstream tasks, including cell type classification and cross-modal imputation. Our work highlights the potential of incorporating scATAC-seq data into existing genomics deep learning strategies, providing a framework for integrating regulatory DNA analysis more seamlessly into genomic modeling.
Date issued
2025-02
URI
https://hdl.handle.net/1721.1/159110
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.