Patient clustering using electronic medical records
Author(s)Shea, Andrew(Andrew L.)
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
Electronic health records (EHR) and their wealth of patient health information present new opportunities for understanding relationships between patients and their conditions. However, EHR data sparsity, quality, and accessibility present various computational challenges. To address these challenges, we apply spectral clustering and variational autoencoders to obtain compact patient representations and clusters from EHR in an unsupervised manner. We apply these methods to the MIMIC dataset, from which we only use ICD-9 diagnostic codes to ensure data accessibility. After obtaining clusters, we conduct high-resolution analysis by examining the 5 most frequent phenotypes within each cluster. We then conduct low-resolution analysis by examining the distribution of phenotypes within each cluster, examining the relationships amongst the most prevalent phenotypes in each cluster by constructing a cluster network, and comparing our findings to existing medical literature. While preliminary, these results suggest that learning from sparse EHR data is sufficient for uncovering associations between conditions and diseases.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020Cataloged from the official PDF of thesis.Includes bibliographical references (pages 61-66).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.