Show simple item record

dc.contributor.authorTian, Yuandong
dc.contributor.authorWu, Jiajun
dc.contributor.authorXue, Tianfan
dc.contributor.authorLim, Joseph Jaewhan
dc.contributor.authorTenenbaum, Joshua B
dc.contributor.authorTorralba, Antonio
dc.contributor.authorFreeman, William T.
dc.date.accessioned2018-03-29T17:43:50Z
dc.date.available2018-03-29T17:43:50Z
dc.date.issued2016-09
dc.identifier.isbn978-3-319-46465-7
dc.identifier.isbn978-3-319-46466-4
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.urihttp://hdl.handle.net/1721.1/114448
dc.description.abstractUnderstanding 3D object structure from a single image is an important but difficult task in computer vision, mostly due to the lack of 3D object annotations in real images. Previous work tackles this problem by either solving an optimization task given 2D keypoint positions, or training on synthetic data with ground truth 3D information. In this work, we propose 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data. This is made possible mainly by two technical innovations. First, we propose a Projection Layer, which projects estimated 3D structure to 2D space, so that 3D-INN can be trained to predict 3D structural parameters supervised by 2D annotations on real images. Second, heatmaps of keypoints serve as an intermediate representation connecting real and synthetic data, enabling 3D-INN to benefit from the variation and abundance of synthetic 3D objects, without suffering from the difference between the statistics of real and synthesized images due to imperfect rendering. The network achieves state-of-the-art performance on both 2D keypoint estimation and 3D structure recovery. We also show that the recovered 3D information can be used in other vision applications, such as image retrieval. Keywords: 3D structure, Single image 3D reconstruction, Keypoint estimation, Neural network, Synthetic dataen_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Robust Intelligence 1212849)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Big Data 1447476)en_US
dc.description.sponsorshipUnited States. Office of Naval Research. Multidisciplinary University Research Initiative (N00014-16-1-2007)en_US
dc.description.sponsorshipShell Researchen_US
dc.description.sponsorshipNational Science Foundation (U.S.) (McGovern Institute for Brain Research at MIT. Center for Brains, Minds, and Machines. STC Award CCF-1231216)en_US
dc.language.isoen_US
dc.publisherSpringeren_US
dc.relation.isversionofhttp://dx.doi.org/10.1007/978-3-319-46466-4_22en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceMIT Web Domainen_US
dc.titleSingle Image 3D Interpreter Networken_US
dc.typeArticleen_US
dc.identifier.citationWu, Jiajun, et al. “Single Image 3D Interpreter Network.” Computer Vision – ECCV 2016, 8-16 October, 2016, Amsterdam, The Netherlands, edited by Bastian Leibe et al., vol. 9910, Springer International Publishing, 2016, pp. 365–82.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Brain and Cognitive Sciencesen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorWu, Jiajun
dc.contributor.mitauthorXue, Tianfan
dc.contributor.mitauthorLim, Joseph Jaewhan
dc.contributor.mitauthorTenenbaum, Joshua B.
dc.contributor.mitauthorTorralba, Antonio
dc.contributor.mitauthorFreeman, William T.
dc.relation.journalComputer Vision – ECCV 2016en_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsWu, Jiajun; Xue, Tianfan; Lim, Joseph J.; Tian, Yuandong; Tenenbaum, Joshua B.; Torralba, Antonio; Freeman, William T.en_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0002-4176-343X
dc.identifier.orcidhttps://orcid.org/0000-0001-5031-6618
dc.identifier.orcidhttps://orcid.org/0000-0002-2476-6428
dc.identifier.orcidhttps://orcid.org/0000-0002-1925-2035
dc.identifier.orcidhttps://orcid.org/0000-0003-4915-0256
dc.identifier.orcidhttps://orcid.org/0000-0002-2231-7995
dspace.mitauthor.errortrue
mit.licenseOPEN_ACCESS_POLICYen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record