Show simple item record

dc.contributor.advisorHadfield-Menell, Dylan
dc.contributor.authorSoni, Prajna
dc.date.accessioned2024-09-24T18:23:21Z
dc.date.available2024-09-24T18:23:21Z
dc.date.issued2024-05
dc.date.submitted2024-07-25T14:17:41.853Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156962
dc.description.abstractLanguage model-based applications are increasingly being deployed in the real world across a variety of contexts. While their rapid success has realized benefits for society, ensuring that they are trained to perform according to societal values and expectations is imperative given their potential to shape societal values, norms, and power dynamics. Evaluation plays a key role in language model (LM) alignment and policy-making. Presently, LM alignment and evaluations are based on developer- and researcher-prescribed attributes, with many benchmarks focusing on performance as dictated by generalized or primarily Western datasets that may not accurately reflect the deployment context. This results in an inevitable misalignment where a model trained on human preference proxies in context A is deployed in context B. Existing evaluation measures and alignment techniques are heavily biased towards the values and perspectives of model developers. In this thesis, I argue that in order to ensure that alignment efforts are specific to their deployment contexts, it is necessary and feasible to design open-ended and participatory methods to elicit a broader range of context-specific axes. I demonstrate the viability of this through, CALMA, a non-prescriptive and grounded participatory process that successfully elicits distinct and context-specific alignment axes for evaluation datasets through in-context studies with two different communities. I further explore the ways in which broader participation can enable more effective adaptive AI regulation due to the crucial role of evaluations in addressing the technology-policy lag.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleAddressing Misalignment in Language Model Deployments through Context-Specific Evaluations
dc.typeThesis
dc.description.degreeS.M.
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.departmentMassachusetts Institute of Technology. Institute for Data, Systems, and Society
dc.contributor.departmentTechnology and Policy Program
dc.identifier.orcidhttps://orcid.org/0009-0005-3379-5334
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science
thesis.degree.nameMaster of Science in Technology and Policy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record