Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/137415.2

Show simple item record

dc.date.accessioned2021-11-05T11:09:52Z
dc.date.available2021-11-05T11:09:52Z
dc.date.issued2019-12
dc.identifier.urihttps://hdl.handle.net/1721.1/137415
dc.description.abstract© 2019 Neural information processing systems foundation. All rights reserved. In the classical contextual bandits problem, in each round t, a learner observes some context c, chooses some action a to perform, and receives some reward ra,t(c). We consider the variant of this problem where in addition to receiving the reward ra,t(c), the learner also learns the values of ra,t(c0) for all other contexts c0; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions, which has gained a lot of attention lately as many platforms have switched to running first-price auctions. We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve Õ(vCKT) regret against all stationary policies, where C is the number of contexts, K the number of actions, and T the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on C and achieve regret Õ(vKT). We simulate our algorithms on real auction data from an ad exchange running first-price auctions (showing that they outperform traditional contextual bandit algorithms).en_US
dc.language.isoen
dc.relation.isversionofhttps://papers.nips.cc/paper/2019/hash/6aadca7bd86c4743e6724f9607256126-Abstract.htmlen_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceNeural Information Processing Systems (NIPS)en_US
dc.titleContextual bandits with cross-learningen_US
dc.typeArticleen_US
dc.identifier.citation2019. "Contextual bandits with cross-learning." Advances in Neural Information Processing Systems, 32.
dc.relation.journalAdvances in Neural Information Processing Systemsen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-03-26T14:16:07Z
dspace.orderedauthorsBalseiro, S; Golrezaei, N; Mahdian, M; Mirrokni, V; Schneider, Jen_US
dspace.date.submission2021-03-26T14:16:08Z
mit.journal.volume32en_US
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version