Show simple item record

dc.contributor.authorYang, Yibo
dc.contributor.authorBlanchard, Antoine
dc.contributor.authorSapsis, Themistoklis
dc.contributor.authorPerdikaris, Paris
dc.date.accessioned2024-04-18T20:50:22Z
dc.date.available2024-04-18T20:50:22Z
dc.date.issued2022-04
dc.identifier.issn1364-5021
dc.identifier.issn1471-2946
dc.identifier.urihttps://hdl.handle.net/1721.1/154219
dc.description.abstractWe present a new type of acquisition function for online decision-making in multi-armed and contextual bandit problems with extreme payoffs. Specifically, we model the payoff function as a Gaussian process and formulate a novel type of upper confidence bound acquisition function that guides exploration towards the bandits that are deemed most relevant according to the variability of the observed rewards. This is achieved by computing a tractable likelihood ratio that quantifies the importance of the output relative to the inputs and essentially acts as an<jats:italic>attention mechanism</jats:italic>that promotes exploration of extreme rewards. Our formulation is supported by asymptotic zero-regret guarantees, and its performance is demonstrated across several synthetic benchmarks, as well as two realistic examples involving noisy sensor network data. Finally, we provide a JAX library for efficient bandit optimization using Gaussian processes.en_US
dc.language.isoen
dc.publisherThe Royal Societyen_US
dc.relation.isversionof10.1098/rspa.2021.0781en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceThe Royal Societyen_US
dc.subjectGeneral Physics and Astronomyen_US
dc.subjectGeneral Engineeringen_US
dc.subjectGeneral Mathematicsen_US
dc.titleOutput-weighted sampling for multi-armed bandits with extreme payoffsen_US
dc.typeArticleen_US
dc.identifier.citationYang Yibo, Blanchard Antoine, Sapsis Themistoklis and Perdikaris Paris 2022Output-weighted sampling for multi-armed bandits with extreme payoffsProc. R. Soc. A.47820210781.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mechanical Engineering
dc.relation.journalProceedings of the Royal Society A: Mathematical, Physical and Engineering Sciencesen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2024-04-18T20:37:28Z
dspace.orderedauthorsYang, Y; Blanchard, A; Sapsis, T; Perdikaris, Pen_US
dspace.date.submission2024-04-18T20:37:30Z
mit.journal.volume478en_US
mit.journal.issue2260en_US
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record