Inverted Pendulum Control: A Comparative Study from Conventional Control to Reinforcement Learning

Ahmad Ataka; Andreas Sandiwan; Hilton Tnunay; Dzuhri Radityo Utomo; Adha Imam Cahyadi

doi:10.22146/jnteti.v12i3.7065

Ahmad Ataka Universitas Gadjah Mada
Andreas Sandiwan Universitas Gadjah Mada
Hilton Tnunay KU Leuven
Dzuhri Radityo Utomo Universitas Gadjah Mada
Adha Imam Cahyadi Universitas Gadjah Mada

DOI: https://doi.org/10.22146/jnteti.v12i3.7065

Keywords: Reinforcement Learning, Inverted Pendulum, Root Locus, State Feedback, PD Control

Abstract

The rise of deep reinforcement learning in recent years has led to its usage in solving various challenging problems, such as chess and Go games. However, despite its recent success in solving highly complex problems, a question arises on whether this class of method is best employed to solve control problems in general, such as driverless cars, mobile robot control, or industrial manipulator control. This paper presents a comparative study between various classes of control algorithms and reinforcement learning in controlling an inverted pendulum system to evaluate the performance of reinforcement learning in a control problem. A test was performed to test the performance of root locus-based control, state compensator control, proportional-derivative (PD) control, and a reinforcement learning method, namely the proximal policy optimization (PPO), to control an inverted pendulum on a cart. The performances of the transient responses (such as overshoot, peak time, and settling time) and the steady-state responses (namely steady-state error and the total energy) were compared. It is found that when given a sufficient amount of training, the reinforcement learning algorithm was able to produce a comparable solution to its control algorithm counterparts despite not knowing anything about the system’s properties. Therefore, it is best used to control plants with little to no information regarding the model where testing a particular policy is easy and safe. It is also recommended for a system with a clear objective function.

References

R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. London, England: The MIT Press, 2020.

A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Commun. ACM, Vol. 60, No. 6, pp. 84–90, Jun. 2017, doi: 10.1145/3065386.

V. Mnih et al., “Playing Atari with Deep Reinforcement Learning,” 2013, arXiv:1312.5602.

V. Mnih et al., “Human-Level Control Through Deep Reinforcement Learning,” Nat., Vol. 518, pp. 529–533, Feb. 2015, doi: 10.1038/nature14236.

D. Silver et al., “A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go Through Self-Play,” Sci., Vol. 362, No. 6419, pp. 1140–1144, Dec. 2018, doi: 10.1126/science.aar6404.

Ardiansyah and E. Rainarli, “Implementasi Q-Learning dan Backpropagation pada Agen yang Memainkan Permainan Flappy Bird,” J. Nas. Tek. Elekt., Teknol. Inf., Vol. 6, No. 1, pp. 1–7, Feb. 2017, doi: 10.22146/jnteti.v6i1.287.

C. Berner et al., “Dota 2 with Large Scale Deep Reinforcement Learning,” 2019, arXiv:1912.06680.

B.R. Kiran et al., “Deep Reinforcement Learning for Autonomous Driving: A Survey,” IEEE Trans. Intell. Transp. Syst., Vol. 23, No. 6, pp. 4909–4926, Jun. 2022, doi: 10.1109/TITS.2021.3054625.

X. Ruan, D. Ren, X. Zhu, and J. Huang, “Mobile Robot Navigation Based on Deep Reinforcement Learning,” 2019 Chin. Control, Decis. Conf. (CCDC), 2019, pp. 6174–6178, doi: 10.1109/CCDC.2019.8832393.

V. Tsounis et al., “DeepGait: Planning and Control of Quadrupedal Gaits Using Deep Reinforcement Learning,” IEEE Robot., Automat. Lett., Vol. 5, No. 2, pp. 3699–3706, Apr. 2020, doi: 10.1109/LRA.2020.2979660.

R. Liu et al., “Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review,” Robot., Vol. 10, No. 1, pp. 1–13, Jan. 2021, doi: doi.org/10.3390/robotics10010022.

J. Ibarz et al., “How to Train Your Robot with Deep Reinforcement Learning: Lessons We Have Learned,” Int. J. Robot. Res., Vol. 40, No. 4–5, pp. 698–721, Jan. 2021, doi:10.1177/0278364920987859.

A. Sharma, R. Ahmad, and C. Finn, “A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning,” 2022, arXiv:2205.05212.

A.G. Barto, R.S. Sutton, and C.W. Anderson, “Neuronlike Adaptive Elements that Can Solve Difficult Learning Control Problems,” IEEE Trans. Syst. Man, Cybern., Vol. SMC-13, No. 5, pp. 834–846, Sept.-Oct. 1983, doi: 10.1109/TSMC.1983.6313077.

J. Schulman et al., “Proximal Policy Optimization Algorithms,” 2017, arXiv:1707.06347.

G. Brockman et al., ‘OpenAI Gym,” 2016, arXiv:1606.01540.

R.P. Borase, D.K. Maghade, S.Y. Sondkar, and S.N. Pawar, “A Review of PID Control, Tuning Methods and Applications,” Int. J. Dyn., Control, Vol. 9, No. 2, pp. 818–827, Jun. 2021, doi: 10.1007/s40435-020-00665-4.[

N.S. Nise, Control Systems Engineering, 4th ed. Hoboken, USA: John Wiley & Sons, Inc., 2004.

S. Liang and R. Srikant, “Why Deep Neural Networks for Function Approximation?” 2017, arXiv:1610.04161.

W. Zhao, J.P. Queralta, and T. Westerlund, “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,” 2020 IEEE Symp. Ser. Comput. Intell. (SSCI), 2020, pp. 737–744.

J. Eschmann, “Reward Function Design in Reinforcement Learning,” in Reinforcement Learning Algorithms: Analysis and Applications, B. Belousov et al., Eds., Cham, Switzerland: Springer Cham, 2021, pp. 25–33.