Scheduling of costly measurements for state estimation using reinforcement learning
Author(s)
Rogers, Keith Eric
DownloadFull printable version (12.54Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Aeronautics and Astronautics.
Advisor
Wallace E. Vander Velde.
Terms of use
Metadata
Show full item recordAbstract
There has long been a significant gap between the theory and practice of measurement scheduling for state estimation problems. Theoretical papers tend to deal rigorously with small-scale, linear problems using methods that are well-grounded in optimization theory. Practical applications deal with high-dimensional, nonlinear problems using heuristic policies. The work in this thesis attempts to bridge that gap by using reinforcement learning (RL) to treat real-world problems. In doing so, it makes contributions to the fields of both measurement scheduling and RL. On the measurement scheduling side, a unified formulation is presented which encompasses the wide variety of problems found in the literature as well as more complex variations. This is used with RL to handle a series of problems of increasing difficulty. Both continuous and discrete action spaces are treated, and RL is shown to be effective with both. The RL-based methods are shown to beat alternative methods from the literature in one case, and are able to consistently match or beat heuristics for both high-dimensional linear problems and simple nonlinear problems. Finally, RL is applied to a high-dimensional nonlinear problem in radar tracking and is able to outperform the best available heuristic by as much as 35%. In treating these problems, it is shown that a useful synergy exists between learned and heuristic policies, with each helping to verify and improve the performance of the other. On the reinforcement learning side, the contribution comes mainly from applying the algorithms in an extremely adverse environment. The measurement scheduling problems treated involve high-dimensional, continuous input spaces and continuous action spaces. The nonlinear cases must use sub-optimal nonlinear filters and are hence non-Markovian. Cost feedback comes in terms of internally propagated states with a sometimes tenuous connection to the environment. In a field where typical applications have both finite state spaces and finite action spaces, these problems test the limits of its usability. Some advances are also made in the treatment of problems where the cost differential is much smaller in the action direction than the state direction. Learning algorithms are presented for a class of transformations to Bellman's equation, of which Advantage Learning represents a special case. Conditions under which Advantage Learning may diverge are described, and an alternative algorithm - called G-Learning - is given which fixes the problem for a sample case.
Description
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 1999. Includes bibliographical references (p. 257-263).
Date issued
1999Department
Massachusetts Institute of Technology. Department of Aeronautics and AstronauticsPublisher
Massachusetts Institute of Technology
Keywords
Aeronautics and Astronautics.