MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Addressing Misalignment in Language Model Deployments through Context-Specific Evaluations

Author(s)
Soni, Prajna
Thumbnail
DownloadThesis PDF (557.8Kb)
Advisor
Hadfield-Menell, Dylan
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
Language model-based applications are increasingly being deployed in the real world across a variety of contexts. While their rapid success has realized benefits for society, ensuring that they are trained to perform according to societal values and expectations is imperative given their potential to shape societal values, norms, and power dynamics. Evaluation plays a key role in language model (LM) alignment and policy-making. Presently, LM alignment and evaluations are based on developer- and researcher-prescribed attributes, with many benchmarks focusing on performance as dictated by generalized or primarily Western datasets that may not accurately reflect the deployment context. This results in an inevitable misalignment where a model trained on human preference proxies in context A is deployed in context B. Existing evaluation measures and alignment techniques are heavily biased towards the values and perspectives of model developers. In this thesis, I argue that in order to ensure that alignment efforts are specific to their deployment contexts, it is necessary and feasible to design open-ended and participatory methods to elicit a broader range of context-specific axes. I demonstrate the viability of this through, CALMA, a non-prescriptive and grounded participatory process that successfully elicits distinct and context-specific alignment axes for evaluation datasets through in-context studies with two different communities. I further explore the ways in which broader participation can enable more effective adaptive AI regulation due to the crucial role of evaluations in addressing the technology-policy lag.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156962
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Institute for Data, Systems, and Society; Technology and Policy Program
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.