Hidden variables’ estimation of trajectories' states using imitation learning
Πρόβλεψη παραμέτρων σε καταστάσεις τροχιών με τη χρήση μιμητικής μάθησης
Master Thesis
Author
Patiniotis Spyropoulos, Dimitrios
Πατηνιώτης Σπυρόπουλος, Δημήτριος
Date
2024-02Advisor
Vouros, GeorgeΒούρος, Γεώργιος
View/ Open
Keywords
Inverse Reinforcement Learning ; Generative Adversarial Imitation Learning ; Trust Region Policy Optimization ; Proximal Policy OptimizationAbstract
Trajectory prediction is a critical problem with profound implications across various domains, from autonomous vehicles and robotics to aerospace and maritime navigation. In this thesis we will be examining trajectories from Paris to Constantinople. Flight trajectories entail navigating through various complexities, including diverse airspace configurations, compliance with international air traffic regulations, and adaptability to dynamic weather conditions. This thesis examines trajectories from Paris to Constantinople, where flight trajectories entail navigating through various complexities, including diverse airspace configurations, compliance with international air traffic regulations, and adaptability to dynamic weather conditions. Our exploration leverages Imitation Learning, focusing on Generative Adversarial Imitation Learning (GAIL), to tackle the trajectory prediction problem in the field of aviation. Specifically, we study the relative performance of two of the most common policy optimization algorithms, Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), in the context of GAIL. Our findings distinctly highlight TRPO's superior performance over PPO within the GAIL framework, marking the main contribution of our research. This is evaluated through the rate of learning per epoch, the training speed of each optimizer, and the final performance of GAIL configured with each algorithm. Additionally, we examine how using more than one agent to predict the hidden variables of the trajectory affects the accuracy of our setup. An additional goal for this thesis was not to imitate spatio-temporal trajectories per se, but to also learn models for imitating critical KPIs (e.g. fuel consumption) of flight trajectories and examine the impact of learning to imitate spatio-temporal trajectories to predicting these KPIs. Highlighting TRPO's superior performance in the GAIL context underscores the main contribution of our research.