Interpretable machine learning for undeclared work prediction
Ερμηνεύσιμη μηχανική μάθηση για πρόβλεψη αδήλωτης εργασίας
Doctoral Thesis
Author
Alogogianni, Eleni
Αλογογιάννη, Ελένη
Date
2024-07Advisor
Virvou, MariaΒίρβου, Μαρία
Keywords
Data mining ; Machine learning ; Interpretable machine learning ; Predictive modeling ; Decision support ; Classification ; Associative classification ; Class imbalance ; Class overlap ; CRISP-DM methodology ; Interpretability ; Targeting ; Risk assessment ; Employment ; Undeclared work ; Labor law violations ; Labor inspectorate ; Social security ; Informal economy ; Tax evasion ; Public AuthorityAbstract
Undeclared work, a fundamental aspect of the informal economy, poses a complex socioeconomic challenge that undermines the well-being of workers, businesses, and the stability of the welfare state. Labor Inspectorates are tasked with combating this illegal employment practice, yet they often prove ineffective, struggling with limited resources and inadequate tools.
This doctoral thesis seeks to enhance the capabilities of public labor law enforcement authorities in addressing undeclared work and other labor law violations through the design and development of an innovative machine learning system. This system generates interpretable predictive models targeting these violations and offers three primary benefits. First, by accurately predicting businesses likely to employ undeclared workers, it significantly aids in the efficient allocation of resources, facilitating effective planning of targeted onsite inspections and other deterrent and preventive measures. Second, by providing clear explanations for these predictions, it improves labor inspectors’ domain knowledge, actively involves them in the inspection planning process, and increases their acceptance and trust in the models’ outputs and recommendations. Third, by identifying and extracting predominant patterns associated with each violation, it offers actionable insights into labor market trends, thereby supporting efficient and strategic decision-making for policy interventions.
This machine learning project is specifically tailored to meet the business needs and utilize the available data of the Hellenic Labour Inspectorate, though the suggested approaches and techniques can be applied in any labor law enforcement authority. It is comprehensively detailed across the six phases of the CRISP-DM methodology. It uses a dataset integrating real inspection, business, and employment data, and employs four algorithms of Associative Classification, an advanced machine learning method known for its high predictive accuracy and enhanced interpretability. Throughout three CRISP-DM life cycle iterations, the project develops a total of 96 classifiers targeting undeclared work and other violations, utilizing various data engineering techniques to address the class imbalance and class overlap issues often inherent in labor inspection datasets, aiming to improve their predictive strength and interpretability. It evaluates the models for both prediction performance and the clarity of the extracted local and global explanations. Furthermore, the project emphasizes the adaptability of machine learning techniques within the operational framework of a public institution, analyzing challenges and proposing solutions to facilitate the smooth integration and acceptance of such innovative approaches within the public sector.
Overall, this research represents a comprehensive and detailed machine learning project designed for application within the business environment of a public institution tasked with effectively combating the illegal practice of undeclared work. It emphasizes advanced techniques and methods that achieve superior prediction performance, enhanced interpretability, and seamless integration into existing business processes. The ultimate aim is to promote the design, implementation, and eventual incorporation of such innovative approaches in responsible public institutions, thereby enhancing their ability to address labor law violations more effectively and strategically.