Explainable reinforcement learning using interpretable models
Επεξηγήσιμη ενισχυτική μάθηση χρησιμοποιώντας ερμηνεύσιμα μοντέλα

Master Thesis
Author
Lykos, Emmanouil
Λύκος, Εμμανουήλ
Date
2024-02Advisor
Vouros, GeorgeΒούρος, Γεώργιος
View/ Open
Keywords
Ενισχυτική μάθηση ; Βαθιά ενισχυτική μάθηση ; Ερμηνεύσιμη μηχανική μάθηση ; Actor-critic μέθοδοιAbstract
Deep Reinforcement Learning methods achieved new milestones in the field of Artificial Intelligence in various domains like gaming and autonomous driving. Those methods incorporate the capabilities of Deep Neural Networks into well known function approximation Reinforcement Learning methods. Although agents' performance is excellent in many cases, their decision-making mechanisms are considered black boxes, therefore, there is a need for software engineers, developers, domain experts, operators etc. to interpret in different levels the inner working of these methods to provide explanations.
The contribution of this thesis is a method that inherently generates interpretable models regarding the decision making of Deep Reinforcement Learning agents which are operating in environments with continuous action spaces. Initially, we will specify the problem that we are solving in a formal way and the scope of this thesis along with the current scientific contributions in that direction and what are the contributions of this thesis. Then, we will provide the necessary background knowledge in order for the reader to understand the proposed method, by firstly describing the interpretable models that we are using and then by presenting the Twin Delayed Policy Gradient method, which is the Actor-Critic Deep Reinforcement Learning method that we aim to modify in order to generate interpretable policy models. Afterwards, we specify our method which follows the mimicking paradigm and replaces the target policy neural network model with an interpretable one, along with the various modifications that we can apply. Afterwards, our method gets evaluated in various environments using Gymnasium and gets compared with the primary policy model that was trained from the original Twin-Delayed Policy Gradient method, both in terms of the learning curve and the standalone performance of the generated primary neural network policy model and the interpretable policy model mimicking it, in order to evaluate interpretations' quality. The performance of agents with the interpretable method is shown to be competitive with comparison to the ones generated from the original non-interpretable method, however with limitations. Last but not least, we justify the results, draw our conclusions and provide directions for future work in this field.