Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Berzak, Yevgeni; Barbu, Andrei; Harari, Daniel; Katz, Boris; Ullman, Shimon

Author(s)

Berzak, Yevgeni; Barbu, Andrei; Harari, Daniel; Katz, Boris; Ullman, Shimon

DownloadCBMM-Memo-051.pdf (2.735Mb)

Terms of use

Attribution-NonCommercial-ShareAlike 3.0 United States http://creativecommons.org/licenses/by-nc-sa/3.0/us/

Metadata

Show full item record

Abstract

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.

Date issued

2016-06-10

URI

http://hdl.handle.net/1721.1/103400

Publisher

Center for Brains, Minds and Machines (CBMM), arXiv

Citation

arXiv:1603.08079v1 [cs.CV]

Series/Report no.

CBMM Memo Series;051

Keywords

Computer Language, Language understanding, Computer vision

Collections

CBMM Memo Series

The following license files are associated with this item:

Creative Commons