Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/137062.2

Show simple item record

dc.contributor.authorGhandeharioun, A
dc.contributor.authorShen, JH
dc.contributor.authorJaques, N
dc.contributor.authorFerguson, C
dc.contributor.authorJones, N
dc.contributor.authorLapedriza, A
dc.contributor.authorPicard, R
dc.date.accessioned2021-11-02T12:16:27Z
dc.date.available2021-11-02T12:16:27Z
dc.date.submitted2019
dc.identifier.urihttps://hdl.handle.net/1721.1/137062
dc.description.abstract© 2019 Neural information processing systems foundation. All rights reserved. Building an open-domain conversational agent is a challenging problem. Current evaluation methods, mostly post-hoc judgments of static conversation, do not capture conversation quality in a realistic interactive context. In this paper, we investigate interactive human evaluation and provide evidence for its necessity; we then introduce a novel, model-agnostic, and dataset-agnostic method to approximate it. In particular, we propose a self-play scenario where the dialog system talks to itself and we calculate a combination of proxies such as sentiment and semantic coherence on the conversation trajectory. We show that this metric is capable of capturing the human-rated quality of a dialog model better than any automated metric known to-date, achieving a significant Pearson correlation (r >.7, p <.05). To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and human evaluation of static conversations, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level. Finally, we open-source the interactive evaluation platform we built and the dataset we collected to allow researchers to efficiently deploy and evaluate dialog models.en_US
dc.language.isoen
dc.relation.isversionofhttps://proceedings.neurips.cc/paper/2019/file/fc9812127bf09c7bd29ad6723c683fb5-Paper.pdfen_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceNeural Information Processing Systems (NIPS)en_US
dc.titleApproximating interactive human evaluation with self-play for open-domain dialog systemsen_US
dc.typeArticleen_US
dc.identifier.citationGhandeharioun, A, Shen, JH, Jaques, N, Ferguson, C, Jones, N et al. "Approximating interactive human evaluation with self-play for open-domain dialog systems." Advances in Neural Information Processing Systems, 32.
dc.relation.journalAdvances in Neural Information Processing Systemsen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-07-06T13:41:07Z
dspace.orderedauthorsGhandeharioun, A; Shen, JH; Jaques, N; Ferguson, C; Jones, N; Lapedriza, A; Picard, Ren_US
dspace.date.submission2021-07-06T13:41:10Z
mit.journal.volume32en_US
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version