Systems Pharmacology – Machine Learning Approaches in Profiling Oncology Drug Candidates
Author(s)
Ujwal, ML
DownloadThesis PDF (25.16Mb)
Advisor
Moser, Bryan R.
Seering, Warren
Terms of use
Metadata
Show full item recordAbstract
While the thesis is framed from the systems thinking perspective, however, the main focus is on the drug discovery and application of machine learning approaches in profiling oncology drug candidates for a select subset of validated targets in the oncogenesis pathways. In this study, we built in-silico predictive models to predict prospective drug candidates from compound libraries. Robust predictive models help in saving enormous experimental, and resource overheads and compress product cycle times. We used several machine learning algorithms, in building models that include logistic regression (LR), support vector machines (SVMs), Naïve Bayes, Artificial neural nets (ANN), and Decision trees – classification and regression tree (CART) and multi-tree majority voting ensemble techniques i.e., random forest and XGBoost.The feature sets for building these models were extracted by computing chemical fingerprints and quantum chemical descriptors. We generated both sparse and dense matrices for modeling. We cross-validated, parameter hypertuned, and evaluated model performance on different statistical performance metrics, including Receiver-Operating Characteristic (ROC) curves.
We investigated the full and reduced model through feature engineering for model stability with LR models. We evaluated model regularization techniques, namely, LASSO, Ridge, Elastic Net, and Neural drop to prevent model overfitting both for LR and ANN models. We evaluated SVM kernels and showed non-linear radial basis function (RBF) performed better than others. We also showed that adding additional hidden layers, beyond three, to the ANN model with ADAM optimizer did not improve performance. Besides, multi-tree ensemble models were superior to single tree models (CART). Finally, we benchmarked the performance metrics of each of these machine learning algorithms in a side-by-side comparison and conclude that the ensemble random forest produced the lowest mean misclassification error.
Date issued
2021-06Department
System Design and Management Program.Publisher
Massachusetts Institute of Technology