Show simple item record

dc.contributor.authorLi, Yunzhu
dc.contributor.authorZhu, Jun-Yan
dc.contributor.authorTedrake, Russ
dc.contributor.authorTorralba, Antonio
dc.date.accessioned2021-11-08T12:42:36Z
dc.date.available2021-11-08T12:42:36Z
dc.date.issued2019-06
dc.identifier.urihttps://hdl.handle.net/1721.1/137632
dc.description.abstract© 2019 IEEE. Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: While our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model.en_US
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionof10.1109/CVPR.2019.01086en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceMIT web domainen_US
dc.titleConnecting Touch and Vision via Cross-Modal Predictionen_US
dc.typeArticleen_US
dc.identifier.citationLi, Yunzhu, Zhu, Jun-Yan, Tedrake, Russ and Torralba, Antonio. 2019. "Connecting Touch and Vision via Cross-Modal Prediction." Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June.
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognitionen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-01-27T17:48:43Z
dspace.orderedauthorsLi, Y; Zhu, J-Y; Tedrake, R; Torralba, Aen_US
dspace.date.submission2021-01-27T17:48:50Z
mit.journal.volume2019-Juneen_US
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record