MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Learning the language of biomolecular interactions

Author(s)
Sledzieski, Samuel
Thumbnail
DownloadThesis PDF (24.47Mb)
Advisor
Berger, Bonnie
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
Proteins are the primary functional unit of the cell, and their interactions drive cellular function. Interactions between proteins are responsible for a wide variety of functions raning from catalytic activity to cellular transport and signaling, and interactions between small molecules and proteins are the foundation of many therapeutics. However, the experimental determination of these interactions is expensive and relatively slow, limiting the ability to model interactions at genome scale. It is therefore critical to develop computational approaches for modeling these interactions. Unsupervised language models trained on amino acid sequences, namely protein language models, learn patterns in sequence evolution that encode protein structure and function. These protein language models are thus a powerful tool for extracting features of proteins, enabling the adoption of lightweight downstream models. Here, we present novel machine learning techniques for adapting protein language modeling to the prediction of protein interactions at scale, enabling de novo interaction network inference and large-scale drug compound screening. We show that these methods achieve state-of-the-art performance, and allow us to discover new biology and therapeutic candidates. In addition, we introduce methods for efficient training and adaptation of these models, and outline several applications which take advantage of the scale enabled by lightweight models. As a whole, this thesis demonstrates how computational advances in language modeling and the massive growth of data brought about by the sequencing revolution can be leveraged to tackle the genotype-to-phenotype challenge in biology, and lays the groundwork for more widespread adoption of these techniques for proteomic modeling.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156633
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.