Show simple item record

dc.contributor.authorGan, Chuang
dc.contributor.authorHuang, Deng
dc.contributor.authorChen, Peihao
dc.contributor.authorTenenbaum, Joshua B
dc.contributor.authorTorralba, Antonio
dc.date.accessioned2021-04-02T14:22:06Z
dc.date.available2021-04-02T14:22:06Z
dc.date.issued2020-11
dc.identifier.isbn9783030586201
dc.identifier.isbn9783030586218
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.urihttps://hdl.handle.net/1721.1/130350
dc.description.abstractIn this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings. We then formulate music generation from videos as a motion-to-MIDI translation problem. We present a Graph−Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements. The MIDI event can then be converted to realistic music using an off-the-shelf music synthesizer tool. We demonstrate the effectiveness of our models on videos containing a variety of music performances. Experimental results show that our model outperforms several existing systems in generating music that is pleasant to listen to. More importantly, the MIDI representations are fully interpretable and transparent, thus enabling us to perform music editing flexibly. We encourage the readers to watch the supplementary video with audio turned on to experience the results.en_US
dc.description.sponsorshipONR MURI (N00014-16-1-2007)en_US
dc.language.isoen
dc.publisherSpringer International Publishingen_US
dc.relation.isversionofhttp://dx.doi.org/10.1007/978-3-030-58621-8_44en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcearXiven_US
dc.titleFoley Music: Learning to Generate Music from Videosen_US
dc.typeBooken_US
dc.identifier.citationGan, Chuang et al. "Foley Music: Learning to Generate Music from Videos." ECCV: European Conference on Computer Vision, Lecture Notes in Computer Science, 12356, Springer International Publishing, 2020, 758-775. © 2020 Springer Nature Switzerland AGen_US
dc.contributor.departmentMIT-IBM Watson AI Laben_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.relation.journalLecture Notes in Computer Scienceen_US
dc.eprint.versionOriginal manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-01-28T15:39:50Z
dspace.orderedauthorsGan, C; Huang, D; Chen, P; Tenenbaum, JB; Torralba, Aen_US
dspace.date.submission2021-01-28T15:39:55Z
mit.journal.volume12356en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record