Advanced Search

Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides

Research and Teaching Output of the MIT Community

Show simple item record Jensen, Kyle Styczynski, Mark Stephanopoulos, Gregory 2005-12-16T14:52:55Z 2005-12-16T14:52:55Z 2006-01
dc.description.abstract Peptide and protein sequences are most commonly represented as a strings: a series of letters selected from the twenty character alphabet of abbreviations for the naturally occurring amino acids. Here, we experiment with representations of small peptide sequences that incorporate more physiochemical information. Specifically, we develop three different physiochemical representations for a set of roughly 700 HIV–I protease substrates. These different representations are used as input to an array of six different machine learning models which are used to predict whether or not a given peptide is likely to be an acceptable substrate for the protease. Our results show that, in general, higher–dimensional physiochemical representations tend to have better performance than representations incorporating fewer dimensions selected on the basis of high information content. We contend that such representations are more biologically relevant than simple string–based representations and are likely to more accurately capture peptide characteristics that are functionally important. en
dc.description.sponsorship Singapore-MIT Alliance (SMA) en
dc.format.extent 331891 bytes
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartofseries Molecular Engineering of Biological and Chemical Systems (MEBCS) en
dc.subject Machine learning en
dc.subject peptides en
dc.subject modeling en
dc.subject physio-chemical properties en
dc.title Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides en
dc.type Article en

Files in this item

Name Size Format Description
MEBCS010.pdf 324.1Kb PDF

This item appears in the following Collection(s)

Show simple item record