Practical applications of large-scale stochastic control for learning and optimization

Gutin, Eli

dc.contributor.advisor	Vivek F. Farias.	en_US
dc.contributor.author	Gutin, Eli	en_US
dc.contributor.other	Massachusetts Institute of Technology. Operations Research Center.	en_US
dc.date.accessioned	2019-02-05T15:17:22Z
dc.date.available	2019-02-05T15:17:22Z
dc.date.copyright	2018	en_US
dc.date.issued	2018	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/120191
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018.	en_US
dc.description	This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.	en_US
dc.description	Cataloged from student-submitted PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 183-188).	en_US
dc.description.abstract	This thesis explores a variety of techniques for large-scale stochastic control. These range from simple heuristics that are motivated by the problem structure and are amenable to analysis, to more general deep reinforcement learning (RL) which applies to broader classes of problems but is trickier to reason about. In the first part of this thesis, we explore a less known application of stochastic control in Multi-armed bandits. By assuming a Bayesian statistical model, we get enough problem structure so that we can formulate an MDP to maximize total rewards. If the objective involved total discounted rewards over an infinite horizon, then the celebrated Gittins index policy would be optimal. Unfortunately, the analysis there does not carry over to the non-discounted, finite-horizon problem. In this work, we propose a tightening sequence of 'optimistic' approximations to the Gittins index. We show that the use of these approximations together with the use of an increasing discount factor appears to offer a compelling alternative to state-of-the-art algorithms. We prove that these optimistic indices constitute a regret optimal algorithm, in the sense of meeting the Lai-Robbins lower bound, including matching constants. The second part of the thesis focuses on the collateral management problem (CMP). In this work, we study the CMP, faced by a prime brokerage, through the lens of multi-period stochastic optimization. We find that, for a large class of CMP instances, algorithms that select collateral based on appropriately computed asset prices are near-optimal. In addition, we back-test the method on data from a prime brokerage and find substantial increases in revenue. Finally, in the third part, we propose novel deep reinforcement learning (DRL) methods for option pricing and portfolio optimization problems. Our work on option pricing enables one to compute tighter confidence bounds on the price, using the same number of Monte Carlo samples, than existing techniques. We also examine constrained portfolio optimization problems and test out policy gradient algorithms that work with somewhat different objective functions. These new objectives measure the performance of a projected version of the policy and penalize constraint violation.	en_US
dc.description.statementofresponsibility	by Eli Gutin.	en_US
dc.format.extent	188 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Operations Research Center.	en_US
dc.title	Practical applications of large-scale stochastic control for learning and optimization	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Operations Research Center
dc.contributor.department	Sloan School of Management
dc.identifier.oclc	1082870705	en_US

Files in this item

Name:: 1082870705-MIT.pdf
Size:: 1.077Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record