Self-Supervised ECG Learning for Multimodal Clinical Tasks
Author(s)
Chen, Peilin
DownloadThesis PDF (2.138Mb)
Advisor
Liang, Paul
Terms of use
Metadata
Show full item recordAbstract
We present a multimodal clinical AI framework that integrates time series, images, and text to support robust diagnostic reasoning across diverse input combinations. We first introduce ECG-JEPA, a self-supervised encoder pretrained on multiple ECG datasets to learn generalizable time series representations. This unimodal pretraining improves ECG classification, achieving a 23-point AUC gain on the underrepresented Ga dataset. We then align and fuse these ECG embeddings with chest X-rays and EHR text using a vision–language model backbone, enabling end-to-end multimodal inference. Our results show that incorporating ECG signals meaningfully improves diagnostic performance, highlighting the value of multitask time series pretraining and modular fusion for clinical AI.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology