Σχεδιασμός, βελτιστοποίηση και εφαρμογή αλγορίθμων machine learning και deep learning, με δυνατότητα αυτόματης εκτέλεσης διασταυρούμενης επικύρωσης και ανανέωσης βάσει των διατηρούμενων συνόλων πολυδιάστατων και σύνθετων δεδομένων

Μακρίδης, Γεώργιος

Design, optimization and implementation of machine learning and deep learning algorithms, with the ability to automatically perform cross-validation and update based on the held sets of multidimensional and complex data

Doctoral Thesis

Author

Μακρίδης, Γεώργιος

Date

2023-03

Abstract

This thesis covers pioneering research at the intersection of the research fields of machine learning and time series forecasting, with application to real industrial process data. The term "time series" refers to a sequence or stream of data that is time-dependent, with temporal correlation being fundamental. Such time series are ubiquitous in our environment and everyday life. The importance of these data derives precisely from their temporal correlation. Of particular interest is the case of correlation between different multidimensional time series. Machine learning is an important factor in the latest developments to address the challenges of extracting useful information from time series data. At the same time, research impact increases when there is a parallel financial benefit for industrial actors. Our approach to data processing and analysis includes preprocessing the raw data, training a machine learning model on the preprocessed data to predict or categorize desired variables, visualizing the results in time series for temporal analysis, and applying explainable artificial intelligence to better understand the models and their predictions. The thesis concerns the analysis of problems that appear at various stages of the production process in the industrial sector and for which Machine Learning methods have not yet been applied. In these cases we implement the proposed integrated artificial intelligence framework which covers both classification and regression problems. A specific aim of the work is the design and optimization of mathematical Machine Learning models, with a special emphasis on multi-layer neural network models applied to multi-dimensional and complex real data in various industrial sectors, such as agriculture, shipping and financial industries. Our goal is to improve these methods and develop new ones based on the challenges and limitations arising from the nature of the data. The thesis focuses on predicting future events and behaviors, enabling businesses to conduct analysis to predict the impact of potential changes to their business strategies. Specifically, we focus on both practical and theoretical aspects of machine learning methods when dealing with time series data mining and management. Our research goals are driven by three common "gaps" in the application of Machine Learning methods and time series, such as the continuous optimization of Machine Learning models (and as a result, their results) the improvement of model results due to the use of surrogate data that they enrich the data set and contribute to a better generalization of the proposed model and an optimal selection of different models. The data areas covered in this project cover shipping, food safety and banking. Regarding the first area, the research focused on the problem of predictive maintenance of the main engines of merchant ships. From the field of food safety and text analysis focused on predicting food recall events from small text stream data as well as inferring the correlation of environmental conditions with meat quality. Finally, in the financial sector, we studied the forecast of cash flows in Small and Medium Enterprises (SMEs) based on historical transaction data. One of our first is a comprehensive investigation of the actuals, where we identified problem investigations. Ships, like other types of "equipment", in their factory settings, are equipped with sensors so that they can collect information about the overall operation of the particular sector - in this case, the marine industry and the condition of its equipment - and more specifically in the case of ships/vessels. These sensors provide data streams that can be analyzed in real time through Artificial Intelligence techniques and collect information about possible errors in the machines, this is precisely the first pillar of our scientific contributions. This information is exploited to drive decisions such as ordering spare parts or changing ports of destination to replace parts that will not fail. The second trade-off in a comprehensive, data-driven predictive maintenance approach. In this work, we present an approach for detecting anomalies in time series data by leveraging Machine Learning technique on sensory data. of the ship, to predict the condition of specific parts of the ship's main engine so that predictive maintenance is possible. The presented approach incorporates both several models that have been analyzed and applied to address the challenge of predictive maintenance in the shipping industry, and a collection of these models to gather clearer information on predictive maintenance outcomes. The second important contribution of this research concerns the analysis and prediction of information about potentially unsafe goods and products. This information is leveraged to drive decision-making, such as which products are most likely to be harmful soon, and then optimize the food supply chain. To address this, we introduce a deep learning approach leveraging Natural Language Processing and Time Series Prediction Techniques to monitor and analyze the risk associated with each food product category and the corresponding potential recalls. Furthermore, we propose a technique that exploits reinforcement learning to use historical recall announcements of food products to predict their future recalls, thus providing information to food companies about upcoming trends in food recalls that can lead to timely recalls. We also evaluate and demonstrate the effectiveness and added value of the proposed approaches through a real scenario that gives promising results. While several techniques/models have been analyzed and applied to address the challenge of food recall predictions, the use of analog/surrogate data has also been studied and evaluated for more accurate results. Parallel to food safety, a real scenario that shows the added value of using the various data collected is that of the unwanted taste and smell that can be present in boar meat, also known as "boar taint". Using this information, pig farmers can gain insights into how they need to adjust their management to reduce boar infestation. This study examines multiple data-driven predictive approaches combined with explainable artificial intelligence (XAI) methods, evaluating them against various explainable metrics while trying to generate useful insights and suggestions. Specifically, in this approach, the considered use case was modeled as a binary classification task resulting in a highly unbalanced data set. With this approach, some functional characteristics related to farm/stall and abattoir conditions have been derived, such as type of feed, type of ventilation system, medication, type of floor and length of time in storage. Our third scientific addition touches on the banking industry and more specifically Small and Medium Enterprises (SMEs) which face a complex and challenging environment, as in most areas they are lagging behind in their digital transformation. Banks, maintaining a variety of data of their SME customers to perform their core activities, could offer a solution using all available data to provide their customers with a Business Financial Management (BFM) toolkit, providing value-added services. core activity. Despite the success of deep learning in many areas, the design of such models is based on the process of trial and error. A rigorous mathematical theory of overparameterized models is still lacking. Towards this end, the present work revolves around the development of an intelligent, highly personalized hybrid transaction categorization model, interfaced with a cash flow forecasting model based on recurrent neural networks (RNNs). As transaction classification is of great importance, this research is extended to explainable artificial intelligence, where the LIME and SHAP frameworks are used to interpret and visualize ML classification results. Our approach shows promising results in a real-world banking use case and serves as the foundation for the development of further BFM banking microservices such as transaction fraud detection and budget monitoring. This scientific work is a step towards more reliable and effective machine learning methods for real industry problems mainly using time series data. Therefore, an innovative XAI model for time series has been integrated into the proposed framework of the thesis. We hope that our findings will motivate future researchers and use serve as tools for engineers in high-impact industrial applications.

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Ψηφιακών Συστημάτων

Number of pages

227

Language

Greek

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/15573
http://dx.doi.org/10.26267/unipi_dione/2995

Collections

Τμήμα Ψηφιακών Συστημάτων

Show full item record

Except where otherwise noted, this item's license is described as
Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 3.0 Ελλάδα