Multi-fidelity reinforcement learning for time-optimal quadrotor re-planning

Ryou, Gilhyun; Wang, Geoffrey; Karaman, Sertac

dc.contributor.author	Ryou, Gilhyun
dc.contributor.author	Wang, Geoffrey
dc.contributor.author	Karaman, Sertac
dc.date.accessioned	2026-03-04T15:32:08Z
dc.date.available	2026-03-04T15:32:08Z
dc.date.issued	2025-08-22
dc.identifier.uri	https://hdl.handle.net/1721.1/165006
dc.description.abstract	High-speed online trajectory planning for UAVs poses a significant challenge due to the need for precise modeling of complex dynamics while also being constrained by computational limitations. This paper presents a multi-fidelity reinforcement learning method (MFRL) that aims to effectively create a realistic dynamics model and simultaneously train a planning policy that can be readily deployed in real-time applications. The proposed method involves the co-training of a planning policy and a reward estimator; the latter predicts the performance of the policy’s output and is trained efficiently through multi-fidelity Bayesian optimization. This optimization approach models the correlation between different fidelity levels, thereby constructing a high-fidelity model based on a low-fidelity foundation, which enables the accurate development of the reward model with limited high-fidelity experiments. The framework is further extended to include real-world flight experiments in reinforcement learning training, allowing the reward model to precisely reflect real-world constraints and broadening the policy’s applicability to real-world scenarios. We present rigorous evaluations by training and testing the planning policy in both simulated and real-world environments. The resulting trained policy not only generates faster and more reliable trajectories compared to the baseline snap minimization method, but it also achieves trajectory updates in 2 ms on average, while the baseline method takes several minutes.	en_US
dc.language.iso	en
dc.publisher	SAGE Publications	en_US
dc.relation.isversionof	https://doi.org/10.1177/02783649251364393	en_US
dc.rights	Creative Commons Attribution-Noncommercial	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc/4.0/	en_US
dc.source	SAGE Publications	en_US
dc.title	Multi-fidelity reinforcement learning for time-optimal quadrotor re-planning	en_US
dc.type	Article	en_US
dc.identifier.citation	Ryou G, Wang G, Karaman S. Multi-fidelity reinforcement learning for time-optimal quadrotor re-planning. The International Journal of Robotics Research. 2025;0(0).	en_US
dc.contributor.department	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems	en_US
dc.relation.journal	The International Journal of Robotics Research	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2026-03-04T15:27:15Z
dspace.orderedauthors	Ryou, G; Wang, G; Karaman, S	en_US
dspace.date.submission	2026-03-04T15:27:18Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: ryou-et-al-2025-multi-fidelity ...
Size:: 3.708Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record