Reinforcement learning : training and evaluation of agents in a graphical environment
Master Thesis
Author
Vintzilaiou, Vasiliki
Βιντζηλαίου, Βασιλική
Date
2025-10View/ Open
Keywords
Reinforcement learning ; Shared policy ; Agents ; Multi-agent graphic enviromentAbstract
This thesis investigates cooperative behavior in multi-agent reinforcement
learning through a shared-policy formulation based on Proximal Policy
Optimization (PPO). The study uses the Pistonball-v6 environment as a case
study, where multiple agents operate under partial observability and
decentralized execution while being controlled by a single shared neural policy.
The objective is to examine how coordination emerges without learned communication channels, centralized control, or agent-specific role specialization.
The implementation is based on a convolutional actor--critic architecture trained
with PPO using parameter sharing across all agents. The experimental analysis
focuses on the effects of hyperparameter selection, rollout length, entropy
regularization, fine-tuning strategies, and evaluation methodology. Multiple
training configurations and random seeds were evaluated in order to examine the
stability and reproducibility of cooperative behavior.
The results demonstrate that non-trivial coordination can emerge despite strong
observational and architectural constraints. Several configurations were able to
produce sustained cooperative behavior and maintain collective ball movement for
extended periods of time. At the same time, the experiments revealed substantial
sensitivity to initialization, stochasticity, and training dynamics, with policy
performance varying considerably across seeds and evaluation settings.
Additionally, in our experiments, stochastic evaluation consistently produced stronger and more
representative behavior than deterministic execution, suggesting that
stochasticity may contribute to behavioral adaptability in partially observable
cooperative environments.
Overall, the findings indicate that shared-policy PPO constitutes a viable
framework for studying coordination in cooperative multi-agent reinforcement
learning. Although the approach exhibits important limitations in terms of
stability and reproducibility, the results demonstrate that meaningful
cooperative behavior can emerge from relatively simple shared-policy
architectures without explicit communication mechanisms.

