Developing a Contextual Annotation Framework for Short Linear Motifs in Proteins

Nyiam, Nten P.

Author(s)

Nyiam, Nten P.

DownloadThesis PDF (1.296Mb)

Advisor

Keating, Amy E.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Identifying and validating short linear motifs (SLiMs) is challenging due to their low sequence complexity and high prevalence across the proteome. Many false positives—sequences that match the pattern of the SLiM but are not involved in the biological functions typically associated with SLiMs—complicate this task. Distinguishing functional SLiMs from false positives requires an approach that incorporates not just sequence analysis but also biological, structural, and evolutionary context. This thesis presents a framework designed to annotate candidate SLiM motifs and differentiate true binders from false positives. The proposed framework uses several annotation metrics, including sequence conservation, post-translational modifications (PTMs), structural context derived from AlphaFold model scores, and the proximity of neighboring motifs. We evaluate each of these metrics using a test dataset sampled from the Eukaryotic Linear Motif (ELM) protein database. Our results indicate that sequence conservation has a consistent but moderate ability to differentiate true binders from unverified candidate motifs. Additionally, integrating AlphaFold’s structural data may help reduce false positives arising from predictions of disordered regions when sampling the motif data. We show that the tool currently underestimates the number of PTMs, suggesting a need for integrating additional PTM databases or predictive tools to improve motif annotation accuracy. Finally, we find that known functional SLiMs tend to cluster more closely than potential false positives, indicating that spatial proximity may help identify true SLiMs in motifs that serve specific roles. These findings highlight the importance of a context-based approach in SLiM annotation and open routes for future research and development.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156589

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses