Notice
This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/137062.2
Approximating interactive human evaluation with self-play for open-domain dialog systems
| dc.contributor.author | Ghandeharioun, A | |
| dc.contributor.author | Shen, JH | |
| dc.contributor.author | Jaques, N | |
| dc.contributor.author | Ferguson, C | |
| dc.contributor.author | Jones, N | |
| dc.contributor.author | Lapedriza, A | |
| dc.contributor.author | Picard, R | |
| dc.date.accessioned | 2021-11-02T12:16:27Z | |
| dc.date.available | 2021-11-02T12:16:27Z | |
| dc.date.submitted | 2019 | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/137062 | |
| dc.description.abstract | © 2019 Neural information processing systems foundation. All rights reserved. Building an open-domain conversational agent is a challenging problem. Current evaluation methods, mostly post-hoc judgments of static conversation, do not capture conversation quality in a realistic interactive context. In this paper, we investigate interactive human evaluation and provide evidence for its necessity; we then introduce a novel, model-agnostic, and dataset-agnostic method to approximate it. In particular, we propose a self-play scenario where the dialog system talks to itself and we calculate a combination of proxies such as sentiment and semantic coherence on the conversation trajectory. We show that this metric is capable of capturing the human-rated quality of a dialog model better than any automated metric known to-date, achieving a significant Pearson correlation (r >.7, p <.05). To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and human evaluation of static conversations, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level. Finally, we open-source the interactive evaluation platform we built and the dataset we collected to allow researchers to efficiently deploy and evaluate dialog models. | en_US |
| dc.language.iso | en | |
| dc.relation.isversionof | https://proceedings.neurips.cc/paper/2019/file/fc9812127bf09c7bd29ad6723c683fb5-Paper.pdf | en_US |
| dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
| dc.source | Neural Information Processing Systems (NIPS) | en_US |
| dc.title | Approximating interactive human evaluation with self-play for open-domain dialog systems | en_US |
| dc.type | Article | en_US |
| dc.identifier.citation | Ghandeharioun, A, Shen, JH, Jaques, N, Ferguson, C, Jones, N et al. "Approximating interactive human evaluation with self-play for open-domain dialog systems." Advances in Neural Information Processing Systems, 32. | |
| dc.relation.journal | Advances in Neural Information Processing Systems | en_US |
| dc.eprint.version | Final published version | en_US |
| dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
| eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
| dc.date.updated | 2021-07-06T13:41:07Z | |
| dspace.orderedauthors | Ghandeharioun, A; Shen, JH; Jaques, N; Ferguson, C; Jones, N; Lapedriza, A; Picard, R | en_US |
| dspace.date.submission | 2021-07-06T13:41:10Z | |
| mit.journal.volume | 32 | en_US |
| mit.license | PUBLISHER_POLICY | |
| mit.metadata.status | Authority Work and Publication Information Needed | en_US |
