An Intent-based Neural Monte Carlo Tree Search Framework for Synthesis of Printed Circuit Boards
Author(s)
Kaphle, Arpan
DownloadThesis PDF (4.449Mb)
Advisor
Wu, Cathy
Hogan, Taylor
Roberto, Luke
Terms of use
Metadata
Show full item recordAbstract
PCB Synthesis is a difficult joint optimization problem that has eluded automation in Electronic Design Automation (EDA) industry until now. Past approaches to create algorithms that intelligently learn to solve these problems have not yet widely been seen. Cadence Design Systems, affiliated with the author, works on solving such problems. This paper proposes the usage of Monte Carlo Tree Search (MCTS) augmented to improve search in order to self-generate datasets, culminating in a process called LFS (Learning Feedback System). This process allows using past data to accelerate MCTS with deep RL models on new or similar board configurations. Datasets are utilized with forms of dataset-based Reinforcement Learning (RL) algorithms, known as ’Offline’ and ’Off-Policy’ algorithms to solve this problem in a useful and simplified scope. The problem scope starts when other approaches have left a design with constraint violations. This paper baselines with both an algorithmically improved version of MCTS and that further accelerated with PPO, a purely online non demonstrator-based Deep RL algorithm. The results find that MCTS allows for smooth self-generation of datasets, a process inspired by AlphaGo Zero. Adding on to that, we find that off-policy and expert-based RL algorithms such as Adversarial Inverse Reinforcement Learning (AIRL) and Generative Adversarial Imitation Learning (GAIL) can significantly utilize the generated dataset to improve solving the board over time, and do far better when compared to the baseline trained for the same training amount once proper tuning is done. We also find that the complexity of the problem related to the performance of the baseline. Regarding our exploration of offline CQL within this MCTS-connected environment, we find that performance was not up to par, but that it was still able to generalize reasonable actions. We found that all approaches can be tuned to further accelerate MCTS’s decision making and help it prune better for larger state spaces upon comparison of overall actions per episode. The results indicate that amongst the methods tried, the neural accelerated MCTS feedback loop proposed seems to promisingly perform the best with expert-based RL methods.
Date issued
2022-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology