Winning at Pokémon Random Battles Using Reinforcement Learning

Wang, Jett

Author(s)

Wang, Jett

DownloadThesis PDF (1.299Mb)

Advisor

Tenenbaum, Joshua

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Pokémon battling is a challenging domain for reinforcement learning techniques, due to the massive state space, stochasticity, and partial observability. We demonstrate an agent which employs a Monte Carlo Tree Search informed by a actor-critic network trained using Proximal Policy Optimization with experience collected through self-play. The agent peaked at rank 8 (1693 Elo) on the official Pokémon Showdown gen4randombattles ladder, which is the best known performance by any non-human agent for this format. This strong showing lays the foundation for superhuman performance in Pokémon and other complex turn-based games of imperfect information, expanding the viability of methods which have historically been used in perfect-information games.

Date issued

2024-02

URI

https://hdl.handle.net/1721.1/153888

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses