Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks

Recasens, Adrià; Kellnhofer, Petr; Stent, Simon; Matusik, Wojciech; Torralba, Antonio

dc.contributor.author	Recasens, Adrià
dc.contributor.author	Kellnhofer, Petr
dc.contributor.author	Stent, Simon
dc.contributor.author	Matusik, Wojciech
dc.contributor.author	Torralba, Antonio
dc.date.accessioned	2021-11-09T12:18:19Z
dc.date.available	2021-11-09T12:18:19Z
dc.date.issued	2018
dc.identifier.issn	0302-9743
dc.identifier.issn	1611-3349
dc.identifier.uri	https://hdl.handle.net/1721.1/137841
dc.description.abstract	© Springer Nature Switzerland AG 2018. We introduce a saliency-based distortion layer for convolutional neural networks that helps to improve the spatial sampling of input data for a given task. Our differentiable layer can be added as a preprocessing block to existing task networks and trained altogether in an end-to-end fashion. The effect of the layer is to efficiently estimate how to sample from the original data in order to boost task performance. For example, for an image classification task in which the original data might range in size up to several megapixels, but where the desired input images to the task network are much smaller, our layer learns how best to sample from the underlying high resolution data in a manner which preserves task-relevant information better than uniform downsampling. This has the effect of creating distorted, caricature-like intermediate images, in which idiosyncratic elements of the image that improve task performance are zoomed and exaggerated. Unlike alternative approaches such as spatial transformer networks, our proposed layer is inspired by image saliency, computed efficiently from uniformly downsampled data, and degrades gracefully to a uniform sampling strategy under uncertainty. We apply our layer to improve existing networks for the tasks of human gaze estimation and fine-grained object classification. Code for our method is available in: http://github.com/recasens/Saliency-Sampler.	en_US
dc.language.iso	en
dc.publisher	Springer International Publishing	en_US
dc.relation.isversionof	10.1007/978-3-030-01240-3_4	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Computer Vision Foundation	en_US
dc.title	Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks	en_US
dc.type	Article	en_US
dc.identifier.citation	Recasens, Adrià, Kellnhofer, Petr, Stent, Simon, Matusik, Wojciech and Torralba, Antonio. 2018. "Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks."
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2019-06-21T16:26:05Z
dspace.date.submission	2019-06-21T16:26:07Z
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: learning_to_zoom.pdf
Size:: 1.121Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record