Beyond predictive modeling : new computational aspects for deep learning based biological applications
Author(s)Liu, Ge,Ph. D.Massachusetts Institute of Technology.
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
Next generation sequencing and large-scale synthetic DNA synthesis have enabled advances in biological studies by providing high-throughput data that can eciently train machine learning models. Deep learning models have proven to provide state-of-the- art performance for predictive tasks across many biological applications. However, black-box predictive modeling is not sucient for scientific discovery in biology. For discovery it is important to nd the mechanisms that underlie outcomes. Mechanism discovery requires the visualization and interpretation of black-box predictive models. Discovery further requires analyzing data from exploratory experiments, and such experiments may produce data that is dissimilar from previous observations and thus be outside of a model's training distribution. Recognizing and quantifying the uncertainty of model predictions on out-of-distribution data is crucial for proper experiment interpretation.Moreover, therapeutic molecular design usually involves iterations of proposing and testing new candidates, which require sequential decision making and directed optimization of molecules in a multiplexed fashion. Finally, certain machine learning design tasks such as vaccine design need to meet objectives such as population coverage which require ecient algorithms for combinatorial optimization. This thesis investigates and proposes novel techniques in four areas: model interpretation, model uncertainty, generating optimized antibody candidates, and optimization of vaccines with population coverage objectives. We first present Deep-Resolve, a novel analysis framework for deep convolutional models of genome function that visualizes how input features contribute individually and combinatorially to network decisions. Unlike other methods, Deep-Resolve does not depend upon the analysis of a predefined set of inputs.Rather, it uses gradient ascent to stochastically explore intermediate feature maps to 1) discover important features, 2) visualize their contribution and interaction patterns, and 3) analyze feature sharing across tasks that suggests shared biological mechanism. Next, we introduce Maximize Overall Diversity (MOD), an approach to improve ensemble-based uncertainty estimates by encouraging larger overall diversity in deep ensemble predictions across all possible inputs. We also explore variations of MOD utilizing adversarial techniques (MOD-Adv) and data density estimation (MOD-R). We show that for out-of-distribution test examples, MOD improves predictive performance and uncertainty calibration on multiple regression and Bayesian Optimization tasks. Thirdly, we use ensembles of deep learning models and gradient based optimization in antibody sequence design. We optimize antibodies for optimized binding affinity and specicity, and experimentally confirm our optimization results.Last, we combine deep learning models for predicting peptide MHC display with population frequency objectives to create a novel vaccine design tool, OptiVax, that estimates and optimizes the population coverage of peptide vaccines to facilitate robust immune responses. We used OptiVax to design peptide vaccines for SARS-CoV-2 and achieved superior predicted population coverage when compared to 29 public baseline designs. Collectively our studies will enable the application of deep learning in broad range of scenarios in biological studies.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020Cataloged from student-submitted PDF of thesis.Includes bibliographical references (pages 165-182).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.