Neural scene de-rendering

Wu, Jiajun; Tenenbaum, Joshua B; Kohli, Pushmeet

dc.contributor.author	Wu, Jiajun
dc.contributor.author	Tenenbaum, Joshua B
dc.contributor.author	Kohli, Pushmeet
dc.date.accessioned	2020-08-18T20:41:05Z
dc.date.available	2020-08-18T20:41:05Z
dc.date.issued	2017
dc.identifier.isbn	978-1-5386-0457-1
dc.identifier.uri	https://hdl.handle.net/1721.1/126659
dc.description.abstract	We study the problem of holistic scene understanding. We would like to obtain a compact, expressive, and interpretable representation of scenes that encodes information such as the number of objects and their categories, poses, positions, etc. Such a representation would allow us to reason about and even reconstruct or manipulate elements of the scene. Previous works have used encoder-decoder based neural architectures to learn image representations; however, representations obtained in this way are typically uninterpretable, or only explain a single object in the scene. In this work, we propose a new approach to learn an interpretable distributed representation of scenes. Our approach employs a deterministic rendering function as the decoder, mapping a naturally structured and disentangled scene description, which we named scene XML, to an image. By doing so, the encoder is forced to perform the inverse of the rendering operation (a.k.a. de-rendering) to transform an input image to the structured scene XML that the decoder used to produce the image. We use a object proposal based encoder that is trained by minimizing both the supervised prediction and the unsupervised reconstruction errors. Experiments demonstrate that our approach works well on scene de-rendering with two different graphics engines, and our learned representation can be easily adapted for a wide range of applications like image editing, inpainting, visual analogy-making, and image captioning.	en_US
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	10.1109/CVPR.2017.744	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Neural scene de-rendering	en_US
dc.type	Article	en_US
dc.identifier.citation	Wu, Jiajun, Joshua B. Tenenbaum, and Pushmeet Kohli. "Neural scene de-rendering." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, Hawaii: 7035-43 doi 10.1109/CVPR.2017.744 ©2017 Author(s)	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.relation.journal	IEEE Conference on Computer Vision and Pattern Recognition (CVPR)	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2019-10-08T14:15:44Z
dspace.date.submission	2019-10-08T14:15:50Z
mit.metadata.status	Complete

Files in this item

Name:: nsd_cvpr.pdf
Size:: 5.301Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record