Show simple item record

dc.contributor.advisorKeating, Amy E.
dc.contributor.authorNyiam, Nten P.
dc.date.accessioned2024-09-03T21:09:47Z
dc.date.available2024-09-03T21:09:47Z
dc.date.issued2024-05
dc.date.submitted2024-07-11T14:36:08.946Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156589
dc.description.abstractIdentifying and validating short linear motifs (SLiMs) is challenging due to their low sequence complexity and high prevalence across the proteome. Many false positives—sequences that match the pattern of the SLiM but are not involved in the biological functions typically associated with SLiMs—complicate this task. Distinguishing functional SLiMs from false positives requires an approach that incorporates not just sequence analysis but also biological, structural, and evolutionary context. This thesis presents a framework designed to annotate candidate SLiM motifs and differentiate true binders from false positives. The proposed framework uses several annotation metrics, including sequence conservation, post-translational modifications (PTMs), structural context derived from AlphaFold model scores, and the proximity of neighboring motifs. We evaluate each of these metrics using a test dataset sampled from the Eukaryotic Linear Motif (ELM) protein database. Our results indicate that sequence conservation has a consistent but moderate ability to differentiate true binders from unverified candidate motifs. Additionally, integrating AlphaFold’s structural data may help reduce false positives arising from predictions of disordered regions when sampling the motif data. We show that the tool currently underestimates the number of PTMs, suggesting a need for integrating additional PTM databases or predictive tools to improve motif annotation accuracy. Finally, we find that known functional SLiMs tend to cluster more closely than potential false positives, indicating that spatial proximity may help identify true SLiMs in motifs that serve specific roles. These findings highlight the importance of a context-based approach in SLiM annotation and open routes for future research and development.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleDeveloping a Contextual Annotation Framework for Short Linear Motifs in Proteins
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Computer Science and Molecular Biology


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record