| dc.contributor.author | Ryou, Gilhyun | |
| dc.contributor.author | Wang, Geoffrey | |
| dc.contributor.author | Karaman, Sertac | |
| dc.date.accessioned | 2026-03-04T15:32:08Z | |
| dc.date.available | 2026-03-04T15:32:08Z | |
| dc.date.issued | 2025-08-22 | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/165006 | |
| dc.description.abstract | High-speed online trajectory planning for UAVs poses a significant challenge due to the need for precise modeling of complex dynamics while also being constrained by computational limitations. This paper presents a multi-fidelity reinforcement learning method (MFRL) that aims to effectively create a realistic dynamics model and simultaneously train a planning policy that can be readily deployed in real-time applications. The proposed method involves the co-training of a planning policy and a reward estimator; the latter predicts the performance of the policy’s output and is trained efficiently through multi-fidelity Bayesian optimization. This optimization approach models the correlation between different fidelity levels, thereby constructing a high-fidelity model based on a low-fidelity foundation, which enables the accurate development of the reward model with limited high-fidelity experiments. The framework is further extended to include real-world flight experiments in reinforcement learning training, allowing the reward model to precisely reflect real-world constraints and broadening the policy’s applicability to real-world scenarios. We present rigorous evaluations by training and testing the planning policy in both simulated and real-world environments. The resulting trained policy not only generates faster and more reliable trajectories compared to the baseline snap minimization method, but it also achieves trajectory updates in 2 ms on average, while the baseline method takes several minutes. | en_US |
| dc.language.iso | en | |
| dc.publisher | SAGE Publications | en_US |
| dc.relation.isversionof | https://doi.org/10.1177/02783649251364393 | en_US |
| dc.rights | Creative Commons Attribution-Noncommercial | en_US |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc/4.0/ | en_US |
| dc.source | SAGE Publications | en_US |
| dc.title | Multi-fidelity reinforcement learning for time-optimal quadrotor re-planning | en_US |
| dc.type | Article | en_US |
| dc.identifier.citation | Ryou G, Wang G, Karaman S. Multi-fidelity reinforcement learning for time-optimal quadrotor re-planning. The International Journal of Robotics Research. 2025;0(0). | en_US |
| dc.contributor.department | Massachusetts Institute of Technology. Laboratory for Information and Decision Systems | en_US |
| dc.relation.journal | The International Journal of Robotics Research | en_US |
| dc.eprint.version | Final published version | en_US |
| dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
| eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
| dc.date.updated | 2026-03-04T15:27:15Z | |
| dspace.orderedauthors | Ryou, G; Wang, G; Karaman, S | en_US |
| dspace.date.submission | 2026-03-04T15:27:18Z | |
| mit.license | PUBLISHER_CC | |
| mit.metadata.status | Authority Work and Publication Information Needed | en_US |