Real-time monitoring of data streams using machine learning techniques for optimizing decision making
Παρακολούθηση ροών δεδομένων σε πραγματικό χρόνο με τη χρήση τεχνικών μηχανικής μάθησης για τη βελτιστοποίηση της λήψης αποφάσεων

Doctoral Thesis
Author
Skarlatos, Kyriakos
Σκαρλάτος, Κυριάκος
Date
2026-03View/ Open
Keywords
Machine learning ; Data streams ; Real time monitoring ; Cluster validity indices ; Artificial Intelligence ; Statistics ; Multivariate statistical process monitoring ; Control charts ; Quality control ; Clustering ; Maritime ; Forecasting ; Classification ; Meteorology ; Big data ; Curse of dimensionalityAbstract
The growth of big data is primarily driven by the increasing digitization of information
and the widespread use of data-collecting devices. As technological advancements
continue, the volume, variety, and velocity of data generation expand, giving rise to big
data analytics aimed at extracting valuable insights from vast datasets. The growing
volume of data presents both opportunities and significant challenges. In the realm
of Statistical Process Monitoring (SPM), the analysis of high-dimensional data often
leads to the “curse of dimensionality”, where data sparsity hinders the detection of
meaningful patterns and anomalies. Additionally, monitoring complex relationships
among multiple variables requires more advanced methods than traditional univariate
approaches. Multivariate Statistical Process Monitoring (MSPM) addresses this need
by employing tools such as multivariate control charts, notably the Hotelling’s T2 chart,
to capture the joint behavior of correlated quality variables. However, implementing
these control charts in real-time, high-dimensional settings is particularly difficult
due to the computational demands and the need for rapid decision-making when
processing large streams of data. A potential solution is to combine traditional MSPM
techniques with modern machine learning approaches; even so, this integration poses
challenges related to model interpretability, feature selection, and result integration.
In this study, a modern and robust method inspired by cluster validity indexing
techniques is presented. This method is compared to the traditional multivariate
control charts based on the Hotelling T2 statistic, as well as to other metrics from
the cluster analysis framework, such as the Dunn, Silhouette, Calinski-Harabasz, and
Davies-Bouldin indices. Extensive simulation studies demonstrate that the proposed
method outperforms existing approaches, particularly in scenarios such as mean drifting
and density-related changes, involving data streams with correlated features. At the
conclusion, various real-world application scenarios utilizing statistical and machine
learning techniques are presented. Ranging from simulations to practical applications,
this dissertation seeks to bridge the concepts of statistics and machine learning, with
the latter inheriting tools and methodologies from the former.

