Reinforcement learning : training and evaluation of agents in a graphical environment

Vintzilaiou, Vasiliki; Βιντζηλαίου, Βασιλική

Master Thesis

Author

Vintzilaiou, Vasiliki

Βιντζηλαίου, Βασιλική

Date

2025-10

Abstract

This thesis investigates cooperative behavior in multi-agent reinforcement learning through a shared-policy formulation based on Proximal Policy Optimization (PPO). The study uses the Pistonball-v6 environment as a case study, where multiple agents operate under partial observability and decentralized execution while being controlled by a single shared neural policy. The objective is to examine how coordination emerges without learned communication channels, centralized control, or agent-specific role specialization. The implementation is based on a convolutional actor--critic architecture trained with PPO using parameter sharing across all agents. The experimental analysis focuses on the effects of hyperparameter selection, rollout length, entropy regularization, fine-tuning strategies, and evaluation methodology. Multiple training configurations and random seeds were evaluated in order to examine the stability and reproducibility of cooperative behavior. The results demonstrate that non-trivial coordination can emerge despite strong observational and architectural constraints. Several configurations were able to produce sustained cooperative behavior and maintain collective ball movement for extended periods of time. At the same time, the experiments revealed substantial sensitivity to initialization, stochasticity, and training dynamics, with policy performance varying considerably across seeds and evaluation settings. Additionally, in our experiments, stochastic evaluation consistently produced stronger and more representative behavior than deterministic execution, suggesting that stochasticity may contribute to behavioral adaptability in partially observable cooperative environments. Overall, the findings indicate that shared-policy PPO constitutes a viable framework for studying coordination in cooperative multi-agent reinforcement learning. Although the approach exhibits important limitations in terms of stability and reproducibility, the results demonstrate that meaningful cooperative behavior can emerge from relatively simple shared-policy architectures without explicit communication mechanisms.

Postgraduate Studies Programme

Πληροφοριακά Συστήματα και Υπηρεσίες

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων

Number of pages

Language

English

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/19389

Collections

Τμήμα Ψηφιακών Συστημάτων

Show full item record