Show simple item record

dc.contributor.authorBoggust, Angie
dc.contributor.authorBang, Hyemin
dc.contributor.authorStrobelt, Hendrik
dc.contributor.authorSatyanarayan, Arvind
dc.date.accessioned2025-09-19T18:02:54Z
dc.date.available2025-09-19T18:02:54Z
dc.date.issued2025-04-25
dc.identifier.isbn979-8-4007-1394-1
dc.identifier.urihttps://hdl.handle.net/1721.1/162767
dc.descriptionCHI ’25, Yokohama, Japanen_US
dc.description.abstractWhile interpretability methods identify a model’s learned concepts, they overlook the relationships between concepts that make up its abstractions and inform its ability to generalize to new data. To assess whether models’ have learned human-aligned abstractions, we introduce abstraction alignment, a methodology to compare model behavior against formal human knowledge. Abstraction alignment externalizes domain-specific human knowledge as an abstraction graph, a set of pertinent concepts spanning levels of abstraction. Using the abstraction graph as a ground truth, abstraction alignment measures the alignment of a model’s behavior by determining how much of its uncertainty is accounted for by the human abstractions. By aggregating abstraction alignment across entire datasets, users can test alignment hypotheses, such as which human concepts the model has learned and where misalignments recur. In evaluations with experts, abstraction alignment differentiates seemingly similar errors, improves the verbosity of existing model-quality metrics, and uncovers improvements to current human abstractions.en_US
dc.publisherACM|CHI Conference on Human Factors in Computing Systemsen_US
dc.relation.isversionofhttps://doi.org/10.1145/3706598.3713406en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleAbstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationshipsen_US
dc.typeArticleen_US
dc.identifier.citationAngie Boggust, Hyemin Bang, Hendrik Strobelt, and Arvind Satyanarayan. 2025. Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI '25). Association for Computing Machinery, New York, NY, USA, Article 417, 1–20.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2025-08-01T08:07:59Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2025-08-01T08:07:59Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record