Decentralized AI for Methylation Data with Applications to Precision Health
Author(s)
Jamee, Mehrab S.
DownloadThesis PDF (1.031Mb)
Advisor
Raskar, Ramesh
Terms of use
Metadata
Show full item recordAbstract
Advances in precision health rely on integrating large-scale genomic data to identify biomarkers and predict health outcomes. However, sharing sensitive patient data between institutions like hospitals poses significant privacy and security challenges, limiting collaboration and the development of robust machine learning models. This thesis proposes a decentralized artificial intelligence framework for analyzing DNA methylation data, enabling institutions to collaboratively train models without exchanging sensitive information. By taking advantage of generative deep learning techniques and federated learning paradigms, the framework aims to impute missing biomarkers in fragmented datasets and improve the accuracy of downstream predictive tasks, like predicting chronological age, mortality, and cancer data. Two intermediate models are implemented and evaluated in this thesis. The first predicts age from DNA methylation data, and can be used for evaluation of the imputation model. The second is an imputation model that uses a conditional autoencoder architecture to reconstruct missing biomarker data in clinical datasets, which is designed to take advantage of contextual methylation embeddings, made available by recently published pretrained epigenomics foundation models. This work seeks to advance the use of decentralized AI in epigenomics, with the ultimate goal of improving personalized healthcare while preserving patient privacy.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology