Show simple item record

dc.contributor.advisorVivek F. Farias.en_US
dc.contributor.authorGutin, Elien_US
dc.contributor.otherMassachusetts Institute of Technology. Operations Research Center.en_US
dc.date.accessioned2019-02-05T15:17:22Z
dc.date.available2019-02-05T15:17:22Z
dc.date.copyright2018en_US
dc.date.issued2018en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/120191
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionCataloged from student-submitted PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 183-188).en_US
dc.description.abstractThis thesis explores a variety of techniques for large-scale stochastic control. These range from simple heuristics that are motivated by the problem structure and are amenable to analysis, to more general deep reinforcement learning (RL) which applies to broader classes of problems but is trickier to reason about. In the first part of this thesis, we explore a less known application of stochastic control in Multi-armed bandits. By assuming a Bayesian statistical model, we get enough problem structure so that we can formulate an MDP to maximize total rewards. If the objective involved total discounted rewards over an infinite horizon, then the celebrated Gittins index policy would be optimal. Unfortunately, the analysis there does not carry over to the non-discounted, finite-horizon problem. In this work, we propose a tightening sequence of 'optimistic' approximations to the Gittins index. We show that the use of these approximations together with the use of an increasing discount factor appears to offer a compelling alternative to state-of-the-art algorithms. We prove that these optimistic indices constitute a regret optimal algorithm, in the sense of meeting the Lai-Robbins lower bound, including matching constants. The second part of the thesis focuses on the collateral management problem (CMP). In this work, we study the CMP, faced by a prime brokerage, through the lens of multi-period stochastic optimization. We find that, for a large class of CMP instances, algorithms that select collateral based on appropriately computed asset prices are near-optimal. In addition, we back-test the method on data from a prime brokerage and find substantial increases in revenue. Finally, in the third part, we propose novel deep reinforcement learning (DRL) methods for option pricing and portfolio optimization problems. Our work on option pricing enables one to compute tighter confidence bounds on the price, using the same number of Monte Carlo samples, than existing techniques. We also examine constrained portfolio optimization problems and test out policy gradient algorithms that work with somewhat different objective functions. These new objectives measure the performance of a projected version of the policy and penalize constraint violation.en_US
dc.description.statementofresponsibilityby Eli Gutin.en_US
dc.format.extent188 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectOperations Research Center.en_US
dc.titlePractical applications of large-scale stochastic control for learning and optimizationen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Operations Research Center
dc.contributor.departmentSloan School of Management
dc.identifier.oclc1082870705en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record