Visual object networks: Image generation with disentangled 3D representation

Zhu, Jun-Yan; Zhang, Zhoutong; Zhang, Chengkai; Wu, Jiajun; Torralba, Antonio; Tenenbaum, Joshua B.; Freeman, William T.

dc.contributor.author	Zhu, Jun-Yan
dc.contributor.author	Zhang, Zhoutong
dc.contributor.author	Zhang, Chengkai
dc.contributor.author	Wu, Jiajun
dc.contributor.author	Torralba, Antonio
dc.contributor.author	Tenenbaum, Joshua B.
dc.contributor.author	Freeman, William T.
dc.date.accessioned	2021-11-04T19:46:16Z
dc.date.available	2021-11-04T19:46:16Z
dc.date.issued	2018
dc.identifier.uri	https://hdl.handle.net/1721.1/137409
dc.description.abstract	© 2018 Curran Associates Inc..All rights reserved. Recent progress in deep generative models has led to tremendous breakthroughs in image generation. However, while existing models can synthesize photorealistic images, they lack an understanding of our underlying 3D world. We present a new generative model, Visual Object Networks (VON), synthesizing natural images of objects with a disentangled 3D representation. Inspired by classic graphics rendering pipelines, we unravel our image formation process into three conditionally independent factors-shape, viewpoint, and texture-and present an end-to-end adversarial learning framework that jointly models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes that are indistinguishable from real shapes. It then renders the object's 2.5D sketches (i.e., silhouette and depth map) from its shape under a sampled viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches to generate natural images. The VON not only generates images that are more realistic than state-of-the-art 2D image synthesis methods, but also enables many 3D operations such as changing the viewpoint of a generated image, editing of shape and texture, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.	en_US
dc.language.iso	en
dc.relation.isversionof	https://papers.nips.cc/paper/7297-visual-object-networks-image-generation-with-disentangled-3d-representations	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Neural Information Processing Systems (NIPS)	en_US
dc.title	Visual object networks: Image generation with disentangled 3D representation	en_US
dc.type	Article	en_US
dc.identifier.citation	Zhu, Jun-Yan, Zhang, Zhoutong, Zhang, Chengkai, Wu, Jiajun, Torralba, Antonio et al. 2018. "Visual object networks: Image generation with disentangled 3D representation."
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2019-05-28T11:56:18Z
dspace.date.submission	2019-05-28T11:56:21Z
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 7297-visual-object-networks-im ...
Size:: 4.712Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record