Predicting unknown adverse drug reactions using an unsupervised node embedding algorithm
Author(s)
Das, Sourav.
Download1144999394-MIT.pdf (6.555Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Lalana Kagal.
Terms of use
Metadata
Show full item recordAbstract
Defined as undesirable effects of a medication that occur during or after usual clinical use, Adverse Drug Reactions (ADRs) pose a major health risk and result in the hospitalization of millions of patients each year. While pre-marketing clinical trials evaluate the safety and efficacy of a new drug, post-marketing surveillance identifies and monitors ADRs that were not previously identified during trials. Traditionally, most approaches tend to focus on ADR detection in the post-marketing phase. Also current approaches mostly use supervised machine learning, requiring significant preprocessing of the data and feature engineering. I developed a customizable framework based on unsupervised learning that allows users to run prediction tasks on different types of labeled graph data. The framework first creates a knowledge graph from the data and then uses an unsupervised algorithm to create embeddings (vector representations) of the nodes in the knowledge graph, and finally runs the prediction task. The framework enables an embedding to be learned for any newly added node as long as it is connected with the other nodes, and users can create embeddings for any pre-marketed drug as long as its related drug attributes are present in the knowledge graph. Using DrugBank and FAERS, I created a knowledge graph of drugs and drug attributes. To emulate drugs in the pre-marketing stage, I removed all the drug-ADR edges in the test dataset. Then, I experimented with different parameters of the node embedding algorithm and three different classifiers namely MLP, KNN and random forest. The models were trained to predict 9 different ADR associations for any drug, and our results showed that the MLP classifier was the best model with an AUROC score of 0.79, which is comparable to existing approaches but with much greater customizability. This approach has potential to improve how ADRs are predicted and allow them to be detected at a far earlier stage thus improving patient safety
Description
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 67-68).
Date issued
2019Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.