Music Gesture for Visual Sound Separation

Gan, Chuang; Huang, Deng; Zhao, Hang; Tenenbaum, Joshua B; Torralba, Antonio

dc.contributor.author	Gan, Chuang
dc.contributor.author	Huang, Deng
dc.contributor.author	Zhao, Hang
dc.contributor.author	Tenenbaum, Joshua B
dc.contributor.author	Torralba, Antonio
dc.date.accessioned	2021-04-06T16:27:33Z
dc.date.available	2021-04-06T16:27:33Z
dc.date.issued	2020-08
dc.date.submitted	2020-06
dc.identifier.isbn	9781728171685
dc.identifier.uri	https://hdl.handle.net/1721.1/130393
dc.description.abstract	Recent deep learning approaches have achieved impressive performance on visual sound separation tasks. However, these approaches are mostly built on appearance and optical flow like motion feature representations, which exhibit limited abilities to find the correlations between audio signals and visual points, especially when separating multiple instruments of the same types, such as multiple violins in a scene. To address this, we propose ''Music Gesture,' a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music. We first adopt a context-aware graph network to integrate visual semantic context with body dynamics and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals. Experimental results on three music performance datasets show: 1) strong improvements upon benchmark metrics for hetero-musical separation tasks (i.e. different instruments); 2) new ability for effective homo-musical separation for piano, flute, and trumpet duets, which to our best knowledge has never been achieved with alternative methods.	en_US
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/cvpr42600.2020.01049	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	Music Gesture for Visual Sound Separation	en_US
dc.type	Article	en_US
dc.identifier.citation	Gan, Chuang et al. "Music Gesture for Visual Sound Separation." 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, Seattle, Washingston, Institute of Electrical and Electronics Engineers, August 2020. © 2020 IEEE	en_US
dc.contributor.department	MIT-IBM Watson AI Lab	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.relation.journal	2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition	en_US
dc.eprint.version	Original manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2021-01-28T15:51:14Z
dspace.orderedauthors	Gan, C; Huang, D; Zhao, H; Tenenbaum, JB; Torralba, A	en_US
dspace.date.submission	2021-01-28T15:51:19Z
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Complete

Files in this item

Name:: 2004.09476.pdf
Size:: 10.10Mb
Format:: PDF
Description:: Submitted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record