MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Integrating Functional Knowledge into Protein Design: A Novel Approach to Tokenization and Noise Injection for Function-Aware Protein Language Models

Author(s)
Tang, Adrina
Thumbnail
DownloadThesis PDF (1.904Mb)
Advisor
Berger, Bonnie
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Designing novel proteins with specific biological functions remains a fundamental challenge in computational biology. While recent advances in protein language models have enabled powerful sequence-based representations, most models, including state-of-the-art systems like ESM3, fall short in effectively encoding functional context during protein generation. In this work, we present a multimodal protein co-design framework that conditions sequence generation on fine-grained functional annotations, specifically leveraging residue-level Gene Ontology (GO) term labels on sequences from the UniRef100 database. By explicitly associating functional signals with residue elements of proteins, our model learns to generate function-conditioned protein sequences that are biologically plausible and semantically consistent. Unlike prior approaches, which treat function as a secondary feature or a classification task, our method focuses on joint reasoning over function and sequence during the design process. This closes a critical gap in the current landscape of protein design tools, offering a scalable and generalizable approach to co-designing protein sequences with user-specified functional profiles.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/162913
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.