MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

MEng Thesis: Incorporating Structured Commonsense into Language Models

Author(s)
Yin, Claire
Thumbnail
DownloadThesis PDF (1.531Mb)
Advisor
Katz, Boris
Lieberman, Henry
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Machine learning has a wide variety of applications in the field of natural language processing (NLP). One such application is fine-tuning large pre-trained models to a wide variety of tasks. In this work, we propose methods to enhance these large language models by infusing them with information found in commonsense knowledge bases. Commonsense is basic knowledge about the world that humans are expected to have and is needed to achieve efficient communication. Often times, to understand texts, a person must use their commonsense to make implicit inferences based on what is explicitly presented in text. We harness the power of relational graph convolutional networks (RGCNs) to encode meaningful commonsense information from graphs and introduce 3 simple methods to inject this knowledge to improve contextual language representations from transformer-based language models. We show that the representations learned from the RGCN are useful in the task of link prediction in a commonsense knowledge base. Additionally, we show that the methods that we introduce to combine the representations of structured commonsense information with a transformer-based language model shows promising results in a downstream information retrieval task and in most types of combinations gives better performance than a baseline transformer-based language model. Lastly, we show that the representations learned from a RGCN, although trained on considerably less data, still prove useful in a downstream information retrieval task when combined with a transformer-based language model.
Date issued
2022-05
URI
https://hdl.handle.net/1721.1/145141
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.