Connecting Touch and Vision via Cross-Modal Prediction

Li, Yunzhu; Zhu, Jun-Yan; Tedrake, Russ; Torralba, Antonio

dc.contributor.author	Li, Yunzhu
dc.contributor.author	Zhu, Jun-Yan
dc.contributor.author	Tedrake, Russ
dc.contributor.author	Torralba, Antonio
dc.date.accessioned	2021-11-08T12:42:36Z
dc.date.available	2021-11-08T12:42:36Z
dc.date.issued	2019-06
dc.identifier.uri	https://hdl.handle.net/1721.1/137632
dc.description.abstract	© 2019 IEEE. Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: While our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model.	en_US
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	10.1109/CVPR.2019.01086	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Connecting Touch and Vision via Cross-Modal Prediction	en_US
dc.type	Article	en_US
dc.identifier.citation	Li, Yunzhu, Zhu, Jun-Yan, Tedrake, Russ and Torralba, Antonio. 2019. "Connecting Touch and Vision via Cross-Modal Prediction." Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June.
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journal	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2021-01-27T17:48:43Z
dspace.orderedauthors	Li, Y; Zhu, J-Y; Tedrake, R; Torralba, A	en_US
dspace.date.submission	2021-01-27T17:48:50Z
mit.journal.volume	2019-June	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: visgel-paper.pdf
Size:: 9.490Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record