Show simple item record

dc.contributor.advisorEran Egozy.en_US
dc.contributor.authorNadeem, Faraaz.en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2021-01-06T17:40:39Z
dc.date.available2021-01-06T17:40:39Z
dc.date.copyright2020en_US
dc.date.issued2020en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/129110
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020en_US
dc.descriptionCataloged from student-submitted PDF of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 123-129).en_US
dc.description.abstractMost videogame reinforcement learning (RL) research only deals with the video component of games, even though humans typically play games while experiencing both audio and video. Additionally, most machine learning audio research deals with music or speech data, rather than environmental sound. We aim to bridge both of these gaps by learning from in-game audio in addition to video, and providing an accessible introduction to videogame audio related topics, in the hopes of further motivating such multi-modal videogame research. We present three main contributions. First, we provide an overview of sound design in video games, supplemented with introductions to diegesis theory and Western classical music theory. Second, we provide methods for extracting, processing, visualizing, and hearing gameplay audio alongside video, building off of Open AI's Gym Retro framework. Third, we train RL agents to play on different levels of Sonic The Hedgehog for the SEGA Genesis, to understand 1) what kinds of audio features are useful when playing videogames, 2) how learned audio features transfer to unseen levels, and 3) if/how audio+video agents outperform video-only agents. We show that in general, agents provided with both audio and video outperform agents with access to only video. Specifically, an agent with the current frame of video and past 1 second of audio outperforms an agent with access to the current and previous frames of video, no audio, and 55% larger model size, by 6.6% on a joint training task, and 20.4% on a zero-shot transfer task. We conclude that game audio informs useful decision making, and that audio features are more easily transferable to unseen test levels than video features.en_US
dc.description.statementofresponsibilityby Faraaz Nadeem.en_US
dc.format.extent129 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleMulti-modal reinforcement learning with videogame audio to learn sonic featuresen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1227100688en_US
dc.description.collectionM.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2021-01-06T17:40:38Zen_US
mit.thesis.degreeMasteren_US
mit.thesis.departmentEECSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record