Using Co-evolutionary Information to Improve Protein Language Modelling
Author(s)
Ram, Soumya
DownloadThesis PDF (320.1Kb)
Advisor
Bepler, Tristan
Terms of use
Metadata
Show full item recordAbstract
Protein engineering has the potential to solve complex global problems in medicine, clean energy, and manufacturing. However, current protein engineering efforts are hampered by a lack of supervised data. We help recitify this issue by developing supervised models that perform well in data-constrained settings by generalizing across protein engineering tasks and better incorporating coevolutionary and structural information. We also develop an unsupervised language model that conditions the target sequence on its multiple sequence alignment, allowing us to better model protein families.
Date issued
2021-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology