Enhanced Potts Models for Improved Computational Protein Design
Author(s)
Lu, Mindren D.
DownloadThesis PDF (982.2Kb)
Advisor
Keating, Amy E.
Terms of use
Metadata
Show full item recordAbstract
Proteins are the fundamental building blocks of life, contributing to the structure, function, and regulation of all living cells. The ability to computationally design proteins to serve specific functions is thus of particular interest to the bioengineering and biomedical fields. TERMinator is a recently-developed neural protein design framework that outperforms state-of-the-art models in native sequence recovery. For a target structure, the model outputs a Potts model, an energy table describing the self and pairwise energetic contributions for all amino acids at all positions.
In this thesis, I investigate approaches for enhancing TERMinator’s outputted Potts models for improved computational protein design. I find that direct regularization of the Potts model parameters leads to higher native sequence recovery. In addition, I use experimental energetic data to benchmark TERMinator’s zero-shot ability to predict the physical properties of proteins. Furthermore, I test the use of this experimental data with a correlational loss function to successfully perform finetuning to improve TERMinator’s performance on orthogonal energetic benchmarks. Finally, I detail an observed disconnect between accuracy on energetic benchmarks and native sequence recovery, illustrating the deficiency of only using native sequence recovery to measure model performance.
Date issued
2022-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology