MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Truthfulness in Large Language Models

Author(s)
Liu, Kevin
Thumbnail
DownloadThesis PDF (1.539Mb)
Advisor
Andreas, Jacob
Hadfield-Menell, Dylan
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Large language models (LLMs) have been experiencing a rapid rise in utility, accessibility, and popularity, but there are still many areas in which they can improve. One such area for improvement is their truthfulness. We seek to improve the truthfulness of LLMs by probing their internal representations. We find that a linear probe on the last hidden layer representation is able to improve a model’s accuracy by reducing its confidence in incorrect answers. However, this probe is less effective at perturbing the model to change its behavior and driving the model towards correct answers.
Date issued
2023-06
URI
https://hdl.handle.net/1721.1/151345
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.