Show simple item record

dc.contributor.advisorAlex P. Pentland.en_US
dc.contributor.authorDubey, Abhimanyu.en_US
dc.contributor.otherProgram in Media Arts and Sciences (Massachusetts Institute of Technology)en_US
dc.date.accessioned2020-01-23T17:01:54Z
dc.date.available2020-01-23T17:01:54Z
dc.date.copyright2019en_US
dc.date.issued2019en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/123636
dc.descriptionThesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2019en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 99-106).en_US
dc.description.abstractIn this thesis, I consider the research problem of designing optimal algorithms for two specific settings of the stochastic multi-armed bandit problem. The first setting considers the problem where rewards are drawn from a family of extremely heavy-tailed distributions known as a-stable distributions. For this setting, I extended an existing upper confidence bound algorithm, to create an optimal frequentist algorithm, titled [alpha]-UCB. Next, I developed a variant of the Bayesian Thompson Sampling algorithm in this setting, titled Robust [alpha]-TS, which involved developing an efficient pipeline for posterior inference. I also proved finite-time regret bounds for this algorithm, that are optimal up to logarithmic factors. The second problem setting I considered was the networked multi-agent problem where agents have local communication, and have unique preferences. This problem setting is a generalization of the co-operative multi-agent stochastic bandit problem, and is a closely related variant of the single-agent bandit setting with side observations. For this setting, I developed an optimal upper confidence bound algorithm, titled Net-UCB. I also proved finite-time regret bounds for this algorithm that are logarithmic in the number of rounds, and are sub-linear in the number of agents. For both settings, I conducted extensive experiments to verify the tightness of the regret bounds established, and compare performance with existing state-of-the-art algorithms. The algorithms proposed in this thesis obtain competitive regret and state-of-the-art performance across a variety of problem settings.en_US
dc.description.statementofresponsibilityby Abhimanyu Dubey.en_US
dc.format.extent106 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectProgram in Media Arts and Sciencesen_US
dc.titleRobust sequential decision-making on networksen_US
dc.typeThesisen_US
dc.description.degreeS.M.en_US
dc.contributor.departmentProgram in Media Arts and Sciences (Massachusetts Institute of Technology)en_US
dc.identifier.oclc1136490586en_US
dc.description.collectionS.M. Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciencesen_US
dspace.imported2020-01-23T17:01:53Zen_US
mit.thesis.degreeMasteren_US
mit.thesis.departmentMediaen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record