Scene Graph Parsing as Dependency Parsing
Author(s)Wang, Yu-Siang; Liu, Chenxi; Zeng, Xiaohui; Yuille, Alan L.
In this paper, we study the problem of parsing structured knowledge graphs from textual descrip- tions. In particular, we consider the scene graph representation that considers objects together with their attributes and relations: this representation has been proved useful across a variety of vision and language applications. We begin by introducing an alternative but equivalent edge-centric view of scene graphs that connect to dependency parses. Together with a careful redesign of label and action space, we combine the two-stage pipeline used in prior work (generic dependency parsing followed by simple post-processing) into one, enabling end-to-end training. The scene graphs generated by our learned neural dependency parser achieve an F-score similarity of 49.67% to ground truth graphs on our evaluation set, surpassing best previous approaches by 5%. We further demonstrate the effective- ness of our learned parser on image retrieval applications.
Center for Brains, Minds and Machines (CBMM)
CBMM Memo Series;082