Show simple item record

dc.contributor.advisorDavid Gifford.en_US
dc.contributor.authorLiu, Ge,Ph. D.Massachusetts Institute of Technology.en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2021-01-06T19:36:10Z
dc.date.available2021-01-06T19:36:10Z
dc.date.copyright2020en_US
dc.date.issued2020en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/129259
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020en_US
dc.descriptionCataloged from student-submitted PDF of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 165-182).en_US
dc.description.abstractNext generation sequencing and large-scale synthetic DNA synthesis have enabled advances in biological studies by providing high-throughput data that can eciently train machine learning models. Deep learning models have proven to provide state-of-the- art performance for predictive tasks across many biological applications. However, black-box predictive modeling is not sucient for scientific discovery in biology. For discovery it is important to nd the mechanisms that underlie outcomes. Mechanism discovery requires the visualization and interpretation of black-box predictive models. Discovery further requires analyzing data from exploratory experiments, and such experiments may produce data that is dissimilar from previous observations and thus be outside of a model's training distribution. Recognizing and quantifying the uncertainty of model predictions on out-of-distribution data is crucial for proper experiment interpretation.en_US
dc.description.abstractMoreover, therapeutic molecular design usually involves iterations of proposing and testing new candidates, which require sequential decision making and directed optimization of molecules in a multiplexed fashion. Finally, certain machine learning design tasks such as vaccine design need to meet objectives such as population coverage which require ecient algorithms for combinatorial optimization. This thesis investigates and proposes novel techniques in four areas: model interpretation, model uncertainty, generating optimized antibody candidates, and optimization of vaccines with population coverage objectives. We first present Deep-Resolve, a novel analysis framework for deep convolutional models of genome function that visualizes how input features contribute individually and combinatorially to network decisions. Unlike other methods, Deep-Resolve does not depend upon the analysis of a predefined set of inputs.en_US
dc.description.abstractRather, it uses gradient ascent to stochastically explore intermediate feature maps to 1) discover important features, 2) visualize their contribution and interaction patterns, and 3) analyze feature sharing across tasks that suggests shared biological mechanism. Next, we introduce Maximize Overall Diversity (MOD), an approach to improve ensemble-based uncertainty estimates by encouraging larger overall diversity in deep ensemble predictions across all possible inputs. We also explore variations of MOD utilizing adversarial techniques (MOD-Adv) and data density estimation (MOD-R). We show that for out-of-distribution test examples, MOD improves predictive performance and uncertainty calibration on multiple regression and Bayesian Optimization tasks. Thirdly, we use ensembles of deep learning models and gradient based optimization in antibody sequence design. We optimize antibodies for optimized binding affinity and specicity, and experimentally confirm our optimization results.en_US
dc.description.abstractLast, we combine deep learning models for predicting peptide MHC display with population frequency objectives to create a novel vaccine design tool, OptiVax, that estimates and optimizes the population coverage of peptide vaccines to facilitate robust immune responses. We used OptiVax to design peptide vaccines for SARS-CoV-2 and achieved superior predicted population coverage when compared to 29 public baseline designs. Collectively our studies will enable the application of deep learning in broad range of scenarios in biological studies.en_US
dc.description.statementofresponsibilityby Ge Liu.en_US
dc.format.extent182 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleBeyond predictive modeling : new computational aspects for deep learning based biological applicationsen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1227520587en_US
dc.description.collectionPh.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2021-01-06T19:36:06Zen_US
mit.thesis.degreeDoctoralen_US
mit.thesis.departmentEECSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record