dc.contributor.advisor | Hadfield-Menell, Dylan | |
dc.contributor.author | Soni, Prajna | |
dc.date.accessioned | 2024-09-24T18:23:21Z | |
dc.date.available | 2024-09-24T18:23:21Z | |
dc.date.issued | 2024-05 | |
dc.date.submitted | 2024-07-25T14:17:41.853Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/156962 | |
dc.description.abstract | Language model-based applications are increasingly being deployed in the real world across a variety of contexts. While their rapid success has realized benefits for society, ensuring that they are trained to perform according to societal values and expectations is imperative given their potential to shape societal values, norms, and power dynamics. Evaluation plays a key role in language model (LM) alignment and policy-making. Presently, LM alignment and evaluations are based on developer- and researcher-prescribed attributes, with many benchmarks focusing on performance as dictated by generalized or primarily Western datasets that may not accurately reflect the deployment context. This results in an inevitable misalignment where a model trained on human preference proxies in context A is deployed in context B.
Existing evaluation measures and alignment techniques are heavily biased towards the values and perspectives of model developers. In this thesis, I argue that in order to ensure that alignment efforts are specific to their deployment contexts, it is necessary and feasible to design open-ended and participatory methods to elicit a broader range of context-specific axes. I demonstrate the viability of this through, CALMA, a non-prescriptive and grounded participatory process that successfully elicits distinct and context-specific alignment axes for evaluation datasets through in-context studies with two different communities. I further explore the ways in which broader participation can enable more effective adaptive AI regulation due to the crucial role of evaluations in addressing the technology-policy lag. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) | |
dc.rights | Copyright retained by author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.title | Addressing Misalignment in Language Model Deployments through Context-Specific Evaluations | |
dc.type | Thesis | |
dc.description.degree | S.M. | |
dc.description.degree | S.M. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.contributor.department | Massachusetts Institute of Technology. Institute for Data, Systems, and Society | |
dc.contributor.department | Technology and Policy Program | |
dc.identifier.orcid | https://orcid.org/0009-0005-3379-5334 | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Science in Electrical Engineering and Computer Science | |
thesis.degree.name | Master of Science in Technology and Policy | |