Collaborative reinforcement learning agents
Συνεργατικοί πράκτορες ενισχυτικής μάθησης

Master Thesis
Author
Vytiniotis, Konstantinos
Βυτινιώτης, Κωνσταντίνος
Date
2026-03Advisor
Spatharis, ChristosΣπαθάρης, Χρήστος
View/ Open
Keywords
QMIX ; Multi-agent ; Reward-shaping ; Credit-assignment ; Non-stationarity ; Ball-balancing ; Partial-observability ; Observability ; Progress-based-reward ; Expert-student-transfer ; Human-agent-interaction ; Cooperative-systems ; Empirical-analysis ; Policy-robustness ; Πολυπρακτορικά-συστήματα ; Διαμόρφωση-ανταμοιβής ; Μη-στασιμότητα ; Μερική-παρατηρησιμότητα ; Συνεργατικά-συστήματαAbstract
Cooperative multi-agent systems require autonomous entities to coordinate in dynamic environments to achieve shared goals, yet training such agents remains a significant challenge due to non-stationarity, the credit assignment problem, and the difficulty of defining reward functions that balance individual efficiency with team cohesion. This thesis investigates these challenges within a physics-based cooperative ball-balancing task, where two agents must synchronize their actions to guide a ball to a target.
A core contribution of this work is a rigorous empirical analysis of reward shaping and information sharing. Our results demonstrate that partial observability counterintuitively outperforms full observability. Restricting state information fosters distinct complementary roles, whereas fully informed agents frequently fall into local optima characterized by hesitation and hovering. Furthermore, we show that progress-based reward shaping yields superior convergence and stability compared to sparse or penaltybased formulations.
Finally, we validate the robustness of the trained policies through expert-student transfer learning and human-agent interaction experiments. Our findings confirm that agents trained with optimal shaping not only solve the task efficiently but generalize effectively when paired with unpredictable human partners, highlighting the practical applicability of the proposed framework for real-world collaborative systems.


