Predicting unknown adverse drug reactions using an unsupervised node embedding algorithm

Das, Sourav.

Author(s)

Das, Sourav.

Download1144999394-MIT.pdf (6.555Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Lalana Kagal.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Defined as undesirable effects of a medication that occur during or after usual clinical use, Adverse Drug Reactions (ADRs) pose a major health risk and result in the hospitalization of millions of patients each year. While pre-marketing clinical trials evaluate the safety and efficacy of a new drug, post-marketing surveillance identifies and monitors ADRs that were not previously identified during trials. Traditionally, most approaches tend to focus on ADR detection in the post-marketing phase. Also current approaches mostly use supervised machine learning, requiring significant preprocessing of the data and feature engineering. I developed a customizable framework based on unsupervised learning that allows users to run prediction tasks on different types of labeled graph data. The framework first creates a knowledge graph from the data and then uses an unsupervised algorithm to create embeddings (vector representations) of the nodes in the knowledge graph, and finally runs the prediction task. The framework enables an embedding to be learned for any newly added node as long as it is connected with the other nodes, and users can create embeddings for any pre-marketed drug as long as its related drug attributes are present in the knowledge graph. Using DrugBank and FAERS, I created a knowledge graph of drugs and drug attributes. To emulate drugs in the pre-marketing stage, I removed all the drug-ADR edges in the test dataset. Then, I experimented with different parameters of the node embedding algorithm and three different classifiers namely MLP, KNN and random forest. The models were trained to predict 9 different ADR associations for any drug, and our results showed that the MLP classifier was the best model with an AUROC score of 0.79, which is comparable to existing approaches but with much greater customizability. This approach has potential to improve how ADRs are predicted and allow them to be detected at a far earlier stage thus improving patient safety

Description

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019

Cataloged from student-submitted PDF version of thesis.

Includes bibliographical references (pages 67-68).

Date issued

2019

URI

https://hdl.handle.net/1721.1/124239

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses