Improving human-robot collaborative reinforcement learning through probabilistic policy reuse

Τσίτος, Αθανάσιος Χριστόφορος; Tsitos, Athanasios C.

Master Thesis

Συγγραφέας

Τσίτος, Αθανάσιος Χριστόφορος

Tsitos, Athanasios C.

Ημερομηνία

2022-06-21

Περίληψη

Socially aware robots should be able, among others, to support fluent human-robot collaboration (HRC) in tasks that require interdependent actions in order to be solved. Similar to human-human collaboration, during HRC the actions of each agent affect the actions of its partner. Towards enhancing mutual performance, collaborative robots (cobots) should be equipped with adaptation and learning capabilities. Overall, mutual learning can be a time consuming procedure that depends on the computational complexity of the task, the motor and cognitive load demanded, as well as the skills of the human partner. Nevertheless, cobots should be able to integrate in their actions the capabilities of their human partner and adapt to their strengths and weaknesses. In the current thesis, we focused on HRC settings where a human and a Deep Reinforcement Learning (DRL) agent need to learn in real-time how to solve a shared task through efficient collaboration. In such scenarios, the performance of the team depends on one hand on the ability of the DRL agent to learn how to solve the task while adapting to its human partner and on the other hand on the ability of the human to understand the strengths and weaknesses of the agent and adapt accordingly. The goal of the thesis was to observe how the mutual performance could be improved when the agent needs to collaborate with different humans. The method used was a transfer learning technique called Probabilistic Policy Reuse, which allows DRL agents to take actions based on other pre-trained policies. In order to access this method, we developed a human-agent game where the human and a DRL agent controlled by the Soft Actor-Critic algorithm needed to jointly control the motion of the end-effector of a robotic manipulator and bring it to a goal position. For the experiments, we asked 16 different people to participate. Half of them played the game with a naive agent, meaning that the agent started to play without having any knowledge about the game, while the other half played the game with an agent, which had access to the actions of an expert agent that was trained beforehand by the author. In the second group, the agent took actions based on his current policy with a probability ψ and actions based on the expert policy with a probability 1ψ. The performance of the teams was evaluated through the travelled distance of the end-effector and the results showed that there was a significant difference between the performance of the teams which played without transfer learning and the teams that played with. This result indicates that applying transfer learning in HRC scenarios where the agent needs to collaborate with different humans might improve the mutual performance of the team.

Τίτλος Προγράμματος Μεταπτυχιακών Σπουδών

Τεχνητή Νοημοσύνη - Artificial Intelligence

Τμήμα

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων

Συνεργαζόμενο Ίδρυμα

National Centre for Scientific Research "Demokritos"

Αριθμός σελίδων

Γλώσσα

Αγγλικά

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/14879
http://dx.doi.org/10.26267/unipi_dione/2301

Συλλογή

Τμήμα Ψηφιακών Συστημάτων

Εμφάνιση πλήρους εγγραφής