Multi-modal reinforcement learning with videogame audio to learn sonic features
Author(s)
Nadeem, Faraaz.
Download1227100688-MIT.pdf (5.361Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Eran Egozy.
Terms of use
Metadata
Show full item recordAbstract
Most videogame reinforcement learning (RL) research only deals with the video component of games, even though humans typically play games while experiencing both audio and video. Additionally, most machine learning audio research deals with music or speech data, rather than environmental sound. We aim to bridge both of these gaps by learning from in-game audio in addition to video, and providing an accessible introduction to videogame audio related topics, in the hopes of further motivating such multi-modal videogame research. We present three main contributions. First, we provide an overview of sound design in video games, supplemented with introductions to diegesis theory and Western classical music theory. Second, we provide methods for extracting, processing, visualizing, and hearing gameplay audio alongside video, building off of Open AI's Gym Retro framework. Third, we train RL agents to play on different levels of Sonic The Hedgehog for the SEGA Genesis, to understand 1) what kinds of audio features are useful when playing videogames, 2) how learned audio features transfer to unseen levels, and 3) if/how audio+video agents outperform video-only agents. We show that in general, agents provided with both audio and video outperform agents with access to only video. Specifically, an agent with the current frame of video and past 1 second of audio outperforms an agent with access to the current and previous frames of video, no audio, and 55% larger model size, by 6.6% on a joint training task, and 20.4% on a zero-shot transfer task. We conclude that game audio informs useful decision making, and that audio features are more easily transferable to unseen test levels than video features.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020 Cataloged from student-submitted PDF of thesis. Includes bibliographical references (pages 123-129).
Date issued
2020Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.