Automata learning for complex event recognition and forecasting

Master Thesis
Author
Baou, Evangelia
Μπάου, Ευαγγελία
Date
2025-09View/ Open
Abstract
This thesis focuses on a comparative study of two different approaches to event prediction in data streams, DISC and Wayeb. DISC uses mixed-integer linear programming (MILP) to automatically train models for early event prediction. The training outputs from DISC are then used as inputs to Wayeb, a method based on probabilistic automata that supports online prediction on continuous data streams. The work examines system behavior under different configurations, emphasizing two critical Wayeb variables: the threshold, which sets the confidence cutoff for whether a prediction is considered positive, and the order, which determines the length of history (past events) the model takes into account. By experimenting with different training data proportions (trace percentage) and evaluating performance using the F1 Score, we analyzed system behavior on two datasets (MIT and Alfred). The results show that Wayeb’s performance depends strongly on proper parameterization, with medium threshold values (0.5–0.7) performing consistently well, and higher order values providing improvements only when sufficient data are available. Notably, on the Alfred dataset we observe high performance even with small training percentages, in contrast to MIT, where increasing the order is not accompanied by a significant performance gain. The thesis highlights DISC’s potential as an automatic training tool and Wayeb’s effectiveness for real-time prediction, offering a combination that is both flexible and powerful for event prediction problems in sequential data.

