Show simple item record

dc.contributor.advisorAndreas, Jacob
dc.contributor.advisorHadfield-Menell, Dylan
dc.contributor.authorLiu, Kevin
dc.date.accessioned2023-07-31T19:32:58Z
dc.date.available2023-07-31T19:32:58Z
dc.date.issued2023-06
dc.date.submitted2023-06-06T16:34:59.367Z
dc.identifier.urihttps://hdl.handle.net/1721.1/151345
dc.description.abstractLarge language models (LLMs) have been experiencing a rapid rise in utility, accessibility, and popularity, but there are still many areas in which they can improve. One such area for improvement is their truthfulness. We seek to improve the truthfulness of LLMs by probing their internal representations. We find that a linear probe on the last hidden layer representation is able to improve a model’s accuracy by reducing its confidence in incorrect answers. However, this probe is less effective at perturbing the model to change its behavior and driving the model towards correct answers.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleTruthfulness in Large Language Models
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record