Semantic Understanding of Scenes Through the ADE20K Dataset

Zhou, Bolei; Zhao, Hang; Puig Fernandez, Xavier; Xiao, Tete; Fidler, Sanja; Barriuso, Adela; Torralba, Antonio

dc.contributor.author	Zhou, Bolei
dc.contributor.author	Zhao, Hang
dc.contributor.author	Puig Fernandez, Xavier
dc.contributor.author	Xiao, Tete
dc.contributor.author	Fidler, Sanja
dc.contributor.author	Barriuso, Adela
dc.contributor.author	Torralba, Antonio
dc.date.accessioned	2020-06-11T20:32:21Z
dc.date.available	2020-06-11T20:32:21Z
dc.date.issued	2018-12
dc.date.submitted	2018-03
dc.identifier.issn	1573-1405
dc.identifier.issn	0920-5691
dc.identifier.uri	https://hdl.handle.net/1721.1/125771
dc.description.abstract	Semantic understanding of visual scenes is one of the holy grails of computer vision. Despite efforts of the community in data collection, there are still few image datasets covering a wide range of scenes and object categories with pixel-wise annotations for scene understanding. In this work, we present a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. Totally there are 25k images of the complex everyday scenes containing a variety of objects in their natural spatial context. On average there are 19.5 instances and 10.5 object classes per image. Based on ADE20K, we construct benchmarks for scene parsing and instance segmentation. We provide baseline performances on both of the benchmarks and re-implement state-of-the-art models for open source. We further evaluate the effect of synchronized batch normalization and find that a reasonably large batch size is crucial for the semantic segmentation performance. We show that the networks trained on ADE20K are able to segment a wide variety of scenes and objects.	en_US
dc.description.sponsorship	NSF (grant 1524817)	en_US
dc.language.iso	en
dc.publisher	Springer Nature	en_US
dc.relation.isversionof	10.1007/S11263-018-1140-0	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	Semantic Understanding of Scenes Through the ADE20K Dataset	en_US
dc.type	Article	en_US
dc.identifier.citation	Zhou, Bolei, et al. "Semantic Understanding of Scenes Through the ADE20K Dataset." International Journal of Computer Vision 127 (2019): 302–321. https://doi.org/10.1007/s11263-018-1140-0 © 2018 Author(s)	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.relation.journal	International Journal of Computer Vision	en_US
dc.eprint.version	Original manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2019-07-11T17:39:30Z
dspace.date.submission	2019-07-11T17:39:31Z
mit.journal.volume	127	en_US
mit.metadata.status	Complete

Files in this item

Name:: 1608.05442.pdf
Size:: 7.638Mb
Format:: PDF
Description:: Submitted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record