On Reinforcement Learning for Turn-based Zero-sum Markov Games

Shah, D; Somani, V; Xie, Q; Xu, Z

dc.contributor.author	Shah, D
dc.contributor.author	Somani, V
dc.contributor.author	Xie, Q
dc.contributor.author	Xu, Z
dc.date.accessioned	2021-11-02T17:41:39Z
dc.date.available	2021-11-02T17:41:39Z
dc.date.issued	2020
dc.identifier.uri	https://hdl.handle.net/1721.1/137142
dc.description.abstract	© 2020 Owner/Author. We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement Learning based approach. Specifically, we propose Explore-Improve-Supervise (EIS) method that combines "exploration", "policy improvement"and "supervised learning"to find the value function and policy associated with Nash equilibrium. We identify sufficient conditions for convergence and correctness for such an approach. For a concrete instance of EIS where random policy is used for "exploration", Monte-Carlo Tree Search is used for "policy improvement"and Nearest Neighbors is used for "supervised learning", we establish that this method finds an\varepsilon-approximate value function of Nash equilibrium in\widetildeO(\varepsilon^-(d+4)) steps when the underlying state-space of the game is continuous and d-dimensional. This is nearly optimal as we establish a lower bound of\widetildeØmega (\varepsilon^-(d+2)) for any policy.	en_US
dc.language.iso	en
dc.publisher	ACM	en_US
dc.relation.isversionof	10.1145/3412815.3416888	en_US
dc.rights	Creative Commons Attribution 4.0 International license	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	ACM	en_US
dc.title	On Reinforcement Learning for Turn-based Zero-sum Markov Games	en_US
dc.type	Article	en_US
dc.identifier.citation	Shah, D, Somani, V, Xie, Q and Xu, Z. 2020. "On Reinforcement Learning for Turn-based Zero-sum Markov Games." FODS 2020 - Proceedings of the 2020 ACM-IMS Foundations of Data Science Conference.
dc.contributor.department	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
dc.relation.journal	FODS 2020 - Proceedings of the 2020 ACM-IMS Foundations of Data Science Conference	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2021-06-25T12:46:49Z
dspace.orderedauthors	Shah, D; Somani, V; Xie, Q; Xu, Z	en_US
dspace.date.submission	2021-06-25T12:46:51Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3412815.3416888.pdf
Size:: 1.283Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record