Show simple item record

dc.contributor.advisorLiang, Paul
dc.contributor.authorChen, Peilin
dc.date.accessioned2025-09-18T14:28:56Z
dc.date.available2025-09-18T14:28:56Z
dc.date.issued2025-05
dc.date.submitted2025-06-23T14:01:22.587Z
dc.identifier.urihttps://hdl.handle.net/1721.1/162719
dc.description.abstractWe present a multimodal clinical AI framework that integrates time series, images, and text to support robust diagnostic reasoning across diverse input combinations. We first introduce ECG-JEPA, a self-supervised encoder pretrained on multiple ECG datasets to learn generalizable time series representations. This unimodal pretraining improves ECG classification, achieving a 23-point AUC gain on the underrepresented Ga dataset. We then align and fuse these ECG embeddings with chest X-rays and EHR text using a vision–language model backbone, enabling end-to-end multimodal inference. Our results show that incorporating ECG signals meaningfully improves diagnostic performance, highlighting the value of multitask time series pretraining and modular fusion for clinical AI.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleSelf-Supervised ECG Learning for Multimodal Clinical Tasks
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record