Foley Music: Learning to Generate Music from Videos

Gan, Chuang; Huang, Deng; Chen, Peihao; Tenenbaum, Joshua B; Torralba, Antonio

Author(s)

Gan, Chuang; Huang, Deng; Chen, Peihao; Tenenbaum, Joshua B; Torralba, Antonio

DownloadSubmitted version (6.044Mb)

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings. We then formulate music generation from videos as a motion-to-MIDI translation problem. We present a Graph−Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements. The MIDI event can then be converted to realistic music using an off-the-shelf music synthesizer tool. We demonstrate the effectiveness of our models on videos containing a variety of music performances. Experimental results show that our model outperforms several existing systems in generating music that is pleasant to listen to. More importantly, the MIDI representations are fully interpretable and transparent, thus enabling us to perform music editing flexibly. We encourage the readers to watch the supplementary video with audio turned on to experience the results.

Date issued

2020-11

URI

https://hdl.handle.net/1721.1/130350

Department

MIT-IBM Watson AI Lab; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory

Journal

Lecture Notes in Computer Science

Publisher

Springer International Publishing

Citation

Gan, Chuang et al. "Foley Music: Learning to Generate Music from Videos." ECCV: European Conference on Computer Vision, Lecture Notes in Computer Science, 12356, Springer International Publishing, 2020, 758-775. © 2020 Springer Nature Switzerland AG

Version: Original manuscript

ISBN

9783030586201

9783030586218

ISSN

0302-9743

1611-3349

Collections

MIT Open Access Articles