Βελτιστοποίηση αναπλήρωσης αποθεμάτων με ενισχυτική μάθηση : ανάπτυξη και αξιολόγηση περιβάλλοντος προσομοίωσης
Optimization of inventory replenishment with reinforcement learning : development and evaluation of a simulation environment

View/ Open
Keywords
Αναπλήρωση αποθεμάτων ; Ενισχυτική μάθηση ; Reinforcement learning ; A2C ; PPOAbstract
This thesis investigates the optimization of inventory management through Reinforcement
Learning (RL) techniques. A custom simulation environment is developed, based on
realistic sales and replenishment data, where two RL agents Proximal Policy
Optimization (PPO) and Advantage Actor Critic (A2C) are trained to make dynamic
replenishment decisions.
The environment's state includes key variables such as current inventory level, demand,
in transit stock, reserved quantities, and the most recent r eplenishment order. The agents
learn to balance the trade off between overstocking and stockouts by maximizing a
cumulative reward signal, designed to reflect demand satisfaction, stock availability, and
cost efficiency.
The performance of the PPO and A2C agents is evaluated using quantitative metrics such
as cumulative reward, policy stability, and demand coverage. These results are compared
to traditional inventory control strategies, such as the (s,Q) policy, highlighting the ability of
RL agents to outp erform classical methods in environments characterized by uncertainty
and fluctuating demand.
This work contributes to the application of reinforcement learning in management science
and demonstrates the practical potential of RL based approaches for suppl y chain
optimization under realistic operational constraints.


