MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

ChaperoNet: Distillation of Language Model Semantics to Folded Three-Dimensional Protein Structures

Author(s)
dos Santos Costa, Allan
Thumbnail
DownloadThesis PDF (7.257Mb)
Advisor
Jacobson, Joseph M.
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Determining the structure of proteins has been a long-standing goal in biology. Lan- guage models have been recently deployed to capture the evolutionary semantics of protein sequences, and as an emergent property, were found to be structural learn- ers. Enriched with multiple sequence alignments (MSA), these transformer models were able to capture significant information about a protein’s tertiary structure. In this work, we show how such structural information can be recovered by processing language model embeddings, and introduce a two-stage folding pipeline to directly es- timate three-dimensional folded structures from protein sequences. We envision that this pipeline will provide a basis for efficient, end-to-end protein structure prediction through protein language modeling.
Date issued
2021-09
URI
https://hdl.handle.net/1721.1/142842
Department
Program in Media Arts and Sciences (Massachusetts Institute of Technology)
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.