dc.contributor.author | Mannor, Shie | |
dc.contributor.author | Tsitsiklis, John N. | |
dc.date.accessioned | 2013-07-01T20:24:53Z | |
dc.date.available | 2013-07-01T20:24:53Z | |
dc.date.issued | 2011-06 | |
dc.identifier.isbn | 9781450306195 | |
dc.identifier.isbn | 1450306195 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/79401 | |
dc.description.abstract | We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudo-polynomial exact and approximation algorithms. | en_US |
dc.description.sponsorship | National Science Foundation (U.S.) (grant CMMI-0856063) | en_US |
dc.description.sponsorship | Israel Science Foundation (contract 890015) | en_US |
dc.description.sponsorship | Technion, Israel Institute of Technology (Horeb Fellowship) | en_US |
dc.language.iso | en_US | |
dc.publisher | International Machine Learning Society | en_US |
dc.relation.isversionof | http://www.icml-2011.org/papers.php | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike 3.0 | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/ | en_US |
dc.source | Tsitsiklis via Amy Stout | en_US |
dc.title | Mean-Variance Optimization in Markov Decision Processes | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Mannor, Shie and John Tsitsiklis. "Mean-Variance Optimization in Markov Decision Processes ." in Twenty-Eighth International Conference on Machine Learning, ICML 2011, Jun. 28-Jul.2, Bellevue, Washington. 2011. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.contributor.mitauthor | Tsitsiklis, John N. | en_US |
dc.relation.journal | Proceedings of the Twenty-Eighth International Conference on Machine Learning, ICML 2011 | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dspace.orderedauthors | Mannor, Shie; Tsitsiklis, John | en_US |
dc.identifier.orcid | https://orcid.org/0000-0003-2658-8239 | |
mit.license | OPEN_ACCESS_POLICY | en_US |
mit.metadata.status | Complete | |