Digital audio processing methods for voice pathology detection
Μέθοδοι ψηφιακής επεξεργασίας ηχητικού σήματος για την ανίχνευση παθολογίας στην ομιλία

Doctoral Thesis
Author
Miliaresi, Ioanna
Μηλιαρέση, Ιωάννα
Date
2025-01Advisor
Pikrakis, AngelosΠικράκης, Άγγελος
View/Open
Keywords
Machine learning ; Deep learning architectures ; Convolutional neural networks ; Electroglottographic signal ; Digital signal processing ; Audio processing ; Voice pathology classification ; COVID-19 ; Dysphonia ; Vocal palsy ; Phonotrauma ; Neoplasm ; FEMH ; SVD dataset SPRsound Virufy ; Coswara ; Respiratory soundsAbstract
Voice pathology refers to a wide range of disorders and diseases that affect voice quality and production, posing significant challenges for accurate diagnosis and classification. This dissertation focuses on the development of innovative machine learning approaches for the automatic classification of vocal and respiratory pathologies, leveraging multimodal data and advanced neural network architectures. Key challenges addressed include limited data availability, effective feature extraction, and the need for models with strong adaptability and generalization capabilities. To this end, five deep learning models are proposed, integrating acoustic, medical, and electroglottographic data. Techniques such as variable-length audio processing, data augmentation, and attention mechanisms are employed to enhance performance. The results demonstrate significant improvements in diagnostic accuracy and robustness across diverse datasets, confirming the potential of flexible, multimodal architectures in the field of voice pathology classification.