Show simple item record

dc.contributor.authorGoldman, Samuel
dc.contributor.authorDas, Ria
dc.contributor.authorYang, Kevin K
dc.contributor.authorColey, Connor Wilson
dc.date.accessioned2022-07-12T20:43:04Z
dc.date.available2022-03-28T19:31:43Z
dc.date.available2022-07-12T20:43:04Z
dc.date.issued2022-02
dc.identifier.urihttps://hdl.handle.net/1721.1/141379.2
dc.description.abstractBiocatalysis is a promising approach to sustainably synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale. However, the adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates. While machine learning and <jats:italic>in silico</jats:italic> directed evolution are well-posed for this predictive modeling challenge, efforts to date have primarily aimed to increase activity against a single known substrate, rather than to identify enzymes capable of acting on new substrates of interest. To address this need, we curate 6 different high-quality enzyme family screens from the literature that each measure multiple enzymes against multiple substrates. We compare machine learning-based compound-protein interaction (CPI) modeling approaches from the literature used for predicting drug-target interactions. Surprisingly, comparing these interaction-based models against collections of independent (single task) enzyme-only or substrate-only models reveals that current CPI approaches are incapable of learning interactions between compounds and proteins in the current family level data regime. We further validate this observation by demonstrating that our no-interaction baseline can outperform CPI-based models from the literature used to guide the discovery of kinase inhibitors. Given the high performance of non-interaction based models, we introduce a new structure-based strategy for pooling residue representations across a protein sequence. Altogether, this work motivates a principled path forward in order to build and evaluate meaningful predictive models for biocatalysis and other drug discovery applications.en_US
dc.language.isoen
dc.publisherPublic Library of Science (PLoS)en_US
dc.relation.isversionof10.1371/journal.pcbi.1009853en_US
dc.rightsCreative Commons Attribution 4.0 International licenseen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.source625865en_US
dc.titleMachine learning modeling of family wide enzyme-substrate specificity screensen_US
dc.typeArticleen_US
dc.identifier.citationGoldman, Samuel, Das, Ria, Yang, Kevin K and Coley, Connor W. 2022. "Machine learning modeling of family wide enzyme-substrate specificity screens." PLOS Computational Biology, 18 (2).en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computational and Systems Biology Programen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemical Engineeringen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journalPLOS Computational Biologyen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2022-03-28T18:32:41Z
dspace.orderedauthorsGoldman, S; Das, R; Yang, KK; Coley, CWen_US
dspace.date.submission2022-03-28T18:32:43Z
mit.journal.volume18en_US
mit.journal.issue2en_US
mit.licensePUBLISHER_CC
mit.metadata.statusPublication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version