Show simple item record

dc.contributor.advisorZamparo, Lee
dc.contributor.advisorHrvatin, Siniša
dc.contributor.authorKim, Dong Young
dc.date.accessioned2025-04-14T14:06:13Z
dc.date.available2025-04-14T14:06:13Z
dc.date.issued2025-02
dc.date.submitted2025-04-03T14:06:18.693Z
dc.identifier.urihttps://hdl.handle.net/1721.1/159110
dc.description.abstractSingle-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) has emerged as a powerful tool for profiling chromatin accessibility at single-cell resolution. By capturing epigenomic landscapes, scATAC-seq provides critical insights into the regulatory elements that govern gene expression. However, the sparsity of scATAC-seq data, resulting from its low sequencing depth relative to the genome’s potential complexity, poses significant challenges for effective and accurate modeling. To advance the utility of scATAC-seq in modern biology, we explore its integration into deep learning frameworks through two innovative applications. First, we demonstrate how incorporating scATAC data enhances the performance of existing genomic language models by providing complementary context about chromatin accessibility. Specifically, we introduce scATAC to improve SegmentNT, a DNA segmentation model that leverages the Nucleotide Transformer (NT) to predict 14 types of genomic and regulatory elements from DNA sequences up to 30kb at single-nucleotide resolution. Second, we introduce a novel multimodal foundation model that extends existing scRNA-seq foundation models by integrating scATAC-seq data. This model captures crossmodal relationships between gene expression and chromatin accessibility, establishing a unified framework that can be fine-tuned for diverse downstream tasks, including cell type classification and cross-modal imputation. Our work highlights the potential of incorporating scATAC-seq data into existing genomics deep learning strategies, providing a framework for integrating regulatory DNA analysis more seamlessly into genomic modeling.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleLeveraging Single-Cell ATAC-Seq for Genomic Language Models and Multimodal Foundation Models
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record