Show simple item record

dc.contributor.advisorJoseph M. Jacobson.en_US
dc.contributor.authorKarydis, Thrasyvoulosen_US
dc.contributor.otherProgram in Media Arts and Sciences (Massachusetts Institute of Technology)en_US
dc.date.accessioned2017-06-06T19:23:55Z
dc.date.available2017-06-06T19:23:55Z
dc.date.copyright2017en_US
dc.date.issued2017en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/109659
dc.descriptionThesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2017.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 75-79).en_US
dc.description.abstractThis thesis lays the foundation for an integrated machine learning framework for the evolutionary analysis, search and design of proteins, based on a hierarchical decomposition of proteins into a set of functional motif embeddings. We introduce, CoMET - Convolutional Motif Embeddings Tool, a machine learning framework that allows the automated extraction of nonlinear motif representations from large sets of protein sequences. At the core of CoMET, lies a Deep Convolutional Neural Network, trained to learn a basis set of motif embeddings by minimizing any desired objective function. CoMET is successfully trained to extract all known motifs across Transcription Factors and CRISPR Associated proteins, without requiring any prior knowledge about the nature of the motifs or their distribution. We demonstrate that motif embeddings can model efficiently inter- and intra- family relationships. Furthermore, we provide novel protein meta-family clusters, formed by taking into account a hierarchical conserved motif phylogeny for each protein instead of a single ultra-conserved region. Lastly, we investigate the generative ability of CoMET and develop computational methods that allow the directed evolution of proteins towards altered or novel functions. We trained a highly accurate predictive model on the DNA recognition code of the Type II restriction enzymes. Based on the promising prediction results, we used the trained models to generate de novo restriction enzymes and paved the way towards the computational design of a restriction enzyme that will cut a given arbitrary DNA sequence with high precision.en_US
dc.description.statementofresponsibilityby Thrasyvoulos Karydis.en_US
dc.format.extent79 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectProgram in Media Arts and Sciences ()en_US
dc.titleLearning hierarchical motif embeddings for protein engineeringen_US
dc.typeThesisen_US
dc.description.degreeS.M.en_US
dc.contributor.departmentProgram in Media Arts and Sciences (Massachusetts Institute of Technology)en_US
dc.identifier.oclc987250344en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record