dc.contributor.author | Xiao, Jianxiong | |
dc.contributor.author | Russell, Bryan C. | |
dc.contributor.author | Hays, James | |
dc.contributor.author | Ehinger, Krista A. | |
dc.contributor.author | Oliva, Aude | |
dc.contributor.author | Torralba, Antonio | |
dc.date.accessioned | 2014-10-15T13:42:04Z | |
dc.date.available | 2014-10-15T13:42:04Z | |
dc.date.issued | 2012-11 | |
dc.identifier.isbn | 9781450319157 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/90941 | |
dc.description.abstract | An early goal of computer vision was to build a system that could automatically understand a 3D scene just by looking. This requires not only the ability to extract 3D information from image information alone, but also to handle the large variety of different environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the SUN database, which is a collection of annotated images spanning 908 different scene categories. This database allows us to systematically study the space of possible everyday scenes and to establish a benchmark for scene and object recognition. We also explore ways of coping with the variety of viewpoints within these scenes. For this, we have introduced a database of 360° panoramic images for many of the scene categories in the SUN database and have explored viewpoint recognition within the environments. Finally, we describe steps toward a unified 3D parsing of everyday scenes: (i) the ability to localize geometric primitives in images, such as cuboids and cylinders, which often comprise many everyday objects, and (ii) an integrated system to extract the 3D structure of the scene and objects depicted in an image. | en_US |
dc.description.sponsorship | National Science Foundation (U.S.) (Grant 1016862) | en_US |
dc.description.sponsorship | Google (Firm) (Research Award) | en_US |
dc.description.sponsorship | United States. Office of Naval Research. Multidisciplinary University Research Initiative (N000141010933) | en_US |
dc.description.sponsorship | National Science Foundation (U.S.) (Career Award 0747120) | en_US |
dc.language.iso | en_US | |
dc.publisher | Association for Computing Machinery (ACM) | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1145/2407746.2407782 | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | MIT web domain | en_US |
dc.title | Basic level scene understanding: from labels to structure and beyond | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Jianxiong Xiao, Bryan C. Russell, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. 2012. Basic level scene understanding: from labels to structure and beyond. In SIGGRAPH Asia 2012 Technical Briefs (SA '12). ACM, New York, NY, USA, Article 36, 4 pages. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.contributor.mitauthor | Xiao, Jianxiong | en_US |
dc.contributor.mitauthor | Ehinger, Krista A. | en_US |
dc.contributor.mitauthor | Oliva, Aude | en_US |
dc.contributor.mitauthor | Torralba, Antonio | en_US |
dc.relation.journal | SIGGRAPH Asia 2012 Technical Briefs (SA '12) | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dspace.orderedauthors | Xiao, Jianxiong; Russell, Bryan C.; Hays, James; Ehinger, Krista A.; Oliva, Aude; Torralba, Antonio | en_US |
dc.identifier.orcid | https://orcid.org/0000-0003-4915-0256 | |
mit.license | OPEN_ACCESS_POLICY | en_US |
mit.metadata.status | Complete | |