Systems Pharmacology – Machine Learning Approaches 
in Profiling Oncology Drug Candidates

Ujwal, ML

Author(s)

Ujwal, ML

DownloadThesis PDF (25.16Mb)

Advisor

Moser, Bryan R.

Seering, Warren

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

While the thesis is framed from the systems thinking perspective, however, the main focus is on the drug discovery and application of machine learning approaches in profiling oncology drug candidates for a select subset of validated targets in the oncogenesis pathways. In this study, we built in-silico predictive models to predict prospective drug candidates from compound libraries. Robust predictive models help in saving enormous experimental, and resource overheads and compress product cycle times. We used several machine learning algorithms, in building models that include logistic regression (LR), support vector machines (SVMs), Naïve Bayes, Artificial neural nets (ANN), and Decision trees – classification and regression tree (CART) and multi-tree majority voting ensemble techniques i.e., random forest and XGBoost.The feature sets for building these models were extracted by computing chemical fingerprints and quantum chemical descriptors. We generated both sparse and dense matrices for modeling. We cross-validated, parameter hypertuned, and evaluated model performance on different statistical performance metrics, including Receiver-Operating Characteristic (ROC) curves. We investigated the full and reduced model through feature engineering for model stability with LR models. We evaluated model regularization techniques, namely, LASSO, Ridge, Elastic Net, and Neural drop to prevent model overfitting both for LR and ANN models. We evaluated SVM kernels and showed non-linear radial basis function (RBF) performed better than others. We also showed that adding additional hidden layers, beyond three, to the ANN model with ADAM optimizer did not improve performance. Besides, multi-tree ensemble models were superior to single tree models (CART). Finally, we benchmarked the performance metrics of each of these machine learning algorithms in a side-by-side comparison and conclude that the ensemble random forest produced the lowest mean misclassification error.

Date issued

2021-06

URI

https://hdl.handle.net/1721.1/139551

Department

System Design and Management Program.

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses