Predicting unknown adverse drug reactions using an unsupervised node embedding algorithm

Das, Sourav.

dc.contributor.advisor	Lalana Kagal.	en_US
dc.contributor.author	Das, Sourav.	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2020-03-24T15:35:50Z
dc.date.available	2020-03-24T15:35:50Z
dc.date.copyright	2019	en_US
dc.date.issued	2019	en_US
dc.identifier.uri	https://hdl.handle.net/1721.1/124239
dc.description	This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.	en_US
dc.description	Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019	en_US
dc.description	Cataloged from student-submitted PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 67-68).	en_US
dc.description.abstract	Defined as undesirable effects of a medication that occur during or after usual clinical use, Adverse Drug Reactions (ADRs) pose a major health risk and result in the hospitalization of millions of patients each year. While pre-marketing clinical trials evaluate the safety and efficacy of a new drug, post-marketing surveillance identifies and monitors ADRs that were not previously identified during trials. Traditionally, most approaches tend to focus on ADR detection in the post-marketing phase. Also current approaches mostly use supervised machine learning, requiring significant preprocessing of the data and feature engineering. I developed a customizable framework based on unsupervised learning that allows users to run prediction tasks on different types of labeled graph data. The framework first creates a knowledge graph from the data and then uses an unsupervised algorithm to create embeddings (vector representations) of the nodes in the knowledge graph, and finally runs the prediction task. The framework enables an embedding to be learned for any newly added node as long as it is connected with the other nodes, and users can create embeddings for any pre-marketed drug as long as its related drug attributes are present in the knowledge graph. Using DrugBank and FAERS, I created a knowledge graph of drugs and drug attributes. To emulate drugs in the pre-marketing stage, I removed all the drug-ADR edges in the test dataset. Then, I experimented with different parameters of the node embedding algorithm and three different classifiers namely MLP, KNN and random forest. The models were trained to predict 9 different ADR associations for any drug, and our results showed that the MLP classifier was the best model with an AUROC score of 0.79, which is comparable to existing approaches but with much greater customizability. This approach has potential to improve how ADRs are predicted and allow them to be detected at a far earlier stage thus improving patient safety	en_US
dc.description.statementofresponsibility	by Sourav Das.	en_US
dc.format.extent	68 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Predicting unknown adverse drug reactions using an unsupervised node embedding algorithm	en_US
dc.type	Thesis	en_US
dc.description.degree	M. Eng.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.oclc	1144999394	en_US
dc.description.collection	M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science	en_US
dspace.imported	2020-03-24T15:35:49Z	en_US
mit.thesis.degree	Master	en_US
mit.thesis.department	EECS	en_US

Files in this item

Name:: 1144999394-MIT.pdf
Size:: 6.555Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record