OPTIMIZING UAV PID FLIGHT CONTROLLERS WITH REINFORCEMENT LEARNING: A TD3 ALGORITHM APPROACH
DOI:
https://doi.org/10.62985/j.huit_ojs.vol26.no2E.407Keywords:
Reinforcement learning, UAV, PID controller, learning-based controlAbstract
This paper proposes a reinforcement learning-based approach to optimize PID control parameters for a quadcopter unmanned aerial vehicle (UAV), employing the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. The model of the UAV is derived considering nonlinearity, exterior disturbances, and strong coupling. The learning agent adaptively updates the PID gains using a reward function that optimizes tracking deviations and excessive control effort. Simulation and comparisons with other methods indicate that the TD3-tuned scheme delivers fast trajectory tracking while enhancing stability, adaptability, and control robustness.
References
[1] M. Hassanalian and A. Abdelkefi, "Classifications, applications, and design challenges of drones: A review," Progress in Aerospace sciences, vol. 91, pp. 99-131, 2017, doi: https://doi.org/10.1016/j.paerosci.2017.04.003.
[2] S. A. H. Mohsan, N. Q. H. Othman, Y. Li, M. H. Alsharif, and M. A. Khan, "Unmanned aerial vehicles (UAVs): Practical aspects, applications, open challenges, security issues, and future trends," Intelligent service robotics, vol. 16, no. 1, pp. 109-137, 2023, doi: https://doi.org/10.1007/s11370-022-00452-4.
[3] S. Bouabdallah and R. Siegwart, "Backstepping and sliding-mode techniques applied to an indoor micro quadrotor," in Proceedings of the 2005 IEEE international conference on robotics and automation, 2005: IEEE, pp. 2247-2252, doi: https://doi.org/10.1109/ROBOT.2005.1570447.
[4] V. Hoang, M. D. Phung, and Q. P. Ha, "Adaptive twisting sliding mode control for quadrotor unmanned aerial vehicles," in 2017 11th Asian control conference (ASCC), 2017: IEEE, pp. 671-676, doi: https://doi.org/10.1109/ASCC.2017.8287250.
[5] K. Alexis, G. Nikolakopoulos, and A. Tzes, "Model predictive quadrotor control: attitude, altitude and position experimental studies," IET Control Theory & Applications, vol. 6, no. 12, pp. 1812-1827, 2012, doi: https://doi.org/10.1049/iet-cta.2011.0348.
[6] G. V. Raffo, M. G. Ortega, and F. R. Rubio, "An integral predictive/nonlinear H∞ control structure for a quadrotor helicopter," Automatica, vol. 46, no. 1, pp. 29-39, 2010, doi: https://doi.org/10.1016/j.automatica.2009.10.018.
[7] B. Han, Y. Zhou, K. K. Deveerasetty, and C. Hu, "A review of control algorithms for quadrotor," in 2018 IEEE international conference on information and automation (ICIA), 2018: IEEE, pp. 951-956, doi: https://doi.org/10.1109/ICInfA.2018.8812437.
[8] S. Bouabdallah and R. Siegwart, "Full control of a quadrotor," in 2007 IEEE/RSJ international conference on intelligent robots and systems, 2007: IEEE, pp. 153-158, doi: https://doi.org/10.1109/IROS.2007.4399042.
[9] P. Castillo, R. Lozano, and A. E. Dzul, Modelling and control of mini-flying machines. Springer, 2005, doi: https://doi.org/10.1007/1-84628-179-2.
[10] R. Mahony, V. Kumar, and P. Corke, "Multirotor aerial vehicles: Modeling, estimation, and control of quadrotor," IEEE robotics & automation magazine, vol. 19, no. 3, pp. 20-32, 2012, doi: https://doi.org/10.1109/MRA.2012.2206474.
[11] G. V. Raffo, M. G. Ortega, and F. R. Rubio, "Backstepping/nonlinear H∞ control for path tracking of a quadrotor unmanned aerial vehicle," in 2008 American Control Conference, 2008: IEEE, pp. 3356-3361, doi: https://doi.org/10.1109/ACC.2008.4587010.
[12] A. G. Barto, "Reinforcement learning: An introduction. by richard’s sutton," SIAM Rev, vol. 6, no. 2, p. 423, 2021, doi: https://doi.org/10.1016/S0893-6080(99)00098-2.
[13] L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst, Reinforcement learning and dynamic programming using function approximators. CRC press, 2017, doi: https://doi.org/10.1201/9781439821091.
[14] G. Bujgoi and D. Sendrescu, "Tuning of PID controllers using reinforcement learning for nonlinear system control," Processes, vol. 13, no. 3, p. 735, 2025, doi: https://doi.org/10.3390/pr13030735.
[15] R. W. Beard and T. W. McLain, Small unmanned aircraft: Theory and practice. Princeton university press, 2012, doi: https://doi.org/10.2514/1.61067.
[16] D. Mellinger and V. Kumar, "Minimum snap trajectory generation and control for quadrotors," in 2011 IEEE international conference on robotics and automation, 2011: IEEE, pp. 2520-2525, doi: https://doi.org/10.1109/ICRA.2011.5980409.
[17] T. Luukkonen, "Modelling and control of quadcopter," Independent research project in applied mathematics, Espoo, vol. 22, no. 22, pp. 1-24, 2011.
[18] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in neural information processing systems, vol. 12, 1999, doi: https://dl.acm.org/doi/10.5555/3009657.3009806.
[19] R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine learning, vol. 8, no. 3, pp. 229-256, 1992, doi: https://doi.org/10.1007/BF00992696.
[20] V. Konda and J. Tsitsiklis, "Actor-critic algorithms," Advances in neural information processing systems, vol. 12, 1999, doi: https://doi.org/10.1137/S0363012901385691.
[21] V. Mnih et al., "Human-level control through deep reinforcement learning," nature, vol. 518, no. 7540, pp. 529-533, 2015, doi: https://doi.org/10.1038/nature14236.
[22] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015, doi: https://doi.org/10.48550/arXiv.1509.02971.
[23] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust region policy optimization," in International conference on machine learning, 2015: PMLR, pp. 1889-1897, doi: https://doi.org/10.48550/arXiv.1502.05477.
[24] S. Fujimoto, H. Hoof, and D. Meger, "Addressing function approximation error in actor-critic methods," in International conference on machine learning, 2018: PMLR, pp. 1587-1596, doi: https://doi.org/10.48550/arXiv.1802.09477.


