Addressing Misalignment in Language Model Deployments through Context-Specific Evaluations

Soni, Prajna

dc.contributor.advisor	Hadfield-Menell, Dylan
dc.contributor.author	Soni, Prajna
dc.date.accessioned	2024-09-24T18:23:21Z
dc.date.available	2024-09-24T18:23:21Z
dc.date.issued	2024-05
dc.date.submitted	2024-07-25T14:17:41.853Z
dc.identifier.uri	https://hdl.handle.net/1721.1/156962
dc.description.abstract	Language model-based applications are increasingly being deployed in the real world across a variety of contexts. While their rapid success has realized benefits for society, ensuring that they are trained to perform according to societal values and expectations is imperative given their potential to shape societal values, norms, and power dynamics. Evaluation plays a key role in language model (LM) alignment and policy-making. Presently, LM alignment and evaluations are based on developer- and researcher-prescribed attributes, with many benchmarks focusing on performance as dictated by generalized or primarily Western datasets that may not accurately reflect the deployment context. This results in an inevitable misalignment where a model trained on human preference proxies in context A is deployed in context B. Existing evaluation measures and alignment techniques are heavily biased towards the values and perspectives of model developers. In this thesis, I argue that in order to ensure that alignment efforts are specific to their deployment contexts, it is necessary and feasible to design open-ended and participatory methods to elicit a broader range of context-specific axes. I demonstrate the viability of this through, CALMA, a non-prescriptive and grounded participatory process that successfully elicits distinct and context-specific alignment axes for evaluation datasets through in-context studies with two different communities. I further explore the ways in which broader participation can enable more effective adaptive AI regulation due to the crucial role of evaluations in addressing the technology-policy lag.
dc.publisher	Massachusetts Institute of Technology
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Addressing Misalignment in Language Model Deployments through Context-Specific Evaluations
dc.type	Thesis
dc.description.degree	S.M.
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.department	Massachusetts Institute of Technology. Institute for Data, Systems, and Society
dc.contributor.department	Technology and Policy Program
dc.identifier.orcid	https://orcid.org/0009-0005-3379-5334
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science
thesis.degree.name	Master of Science in Technology and Policy

Files in this item

Name:: soni_prajna_sm_tpp_2024_thesis.pdf
Size:: 557.8Kb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record