Addressing Misalignment in Language Model Deployments through Context-Specific Evaluations

Soni, Prajna

Author(s)

Soni, Prajna

DownloadThesis PDF (557.8Kb)

Advisor

Hadfield-Menell, Dylan

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

Language model-based applications are increasingly being deployed in the real world across a variety of contexts. While their rapid success has realized benefits for society, ensuring that they are trained to perform according to societal values and expectations is imperative given their potential to shape societal values, norms, and power dynamics. Evaluation plays a key role in language model (LM) alignment and policy-making. Presently, LM alignment and evaluations are based on developer- and researcher-prescribed attributes, with many benchmarks focusing on performance as dictated by generalized or primarily Western datasets that may not accurately reflect the deployment context. This results in an inevitable misalignment where a model trained on human preference proxies in context A is deployed in context B. Existing evaluation measures and alignment techniques are heavily biased towards the values and perspectives of model developers. In this thesis, I argue that in order to ensure that alignment efforts are specific to their deployment contexts, it is necessary and feasible to design open-ended and participatory methods to elicit a broader range of context-specific axes. I demonstrate the viability of this through, CALMA, a non-prescriptive and grounded participatory process that successfully elicits distinct and context-specific alignment axes for evaluation datasets through in-context studies with two different communities. I further explore the ways in which broader participation can enable more effective adaptive AI regulation due to the crucial role of evaluations in addressing the technology-policy lag.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156962

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Institute for Data, Systems, and Society; Technology and Policy Program

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses