Multimodal Representation Learning for Medical Image Analysis
MetadataShow full item record
My thesis develops machine learning methods that exploit multimodal clinical data to improve medical image analysis. Medical images capture rich information of a patient’s physiological and disease status, central in clinical practice and research. Computational models, such as artificial neural networks, enable automatic and quantitative medical image analysis, which may offer timely diagnosis in low-resource settings, advance precision medicine, and facilitate large-scale clinical research. Developing such image models demands large training data. Although digital medical images have become increasingly available, limited structured image labels for the image model training have remained a bottleneck. To overcome this challenge, I have built machine learning algorithms for medical image model development by exploiting other clinical data. Clinical data is often multimodal, including images, text (e.g., radiology reports, clinical notes), and numerical signals (e.g., vital signs, laboratory measurements). These multimodal sources of information reflect different yet correlated manifestations of a subject’s underlying physiological processes. I propose machine learning methods that take advantage of the correlations between medical images and other clinical data to yield accurate computer vision models. I use mutual information to capture the correlations and develop novel algorithms for multimodal representation learning by leveraging local data features. The experiments described in this thesis demonstrate the advances of the multimodal learning approaches in the application of chest x-ray analysis.
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology