Πλατφόρμα συγκριτικής ανάλυσης μοντέλων ASR για την ελληνική γλώσσα : Whisper vs wav2vec2
Comparative analysis platform for ASR models in Greek language : Whisper vs wav2vec2

View/ Open
Keywords
Μοντέλα ASR ; Whisper Large 3 ; wav2vec2 ; Microservices architectureAbstract
This Master's thesis describes the development of a web-based comparison platform for two leading Automatic Speech Recognition (ASR) models in the Greek language: OpenAI's Whisper large-v3 and wav2vec2-greek (lighteternal/wav2vec2-large-xlsr-53-greek). Greek poses particular challenges for ASR systems due to its complex diacritical marking system, extensive morphological variation, and limited training data availability, making specialized evaluation of available solutions necessary.
The implementation relies on a microservices architecture consisting of seven independent services, with programmable GPU resource allocation (60% for Whisper, 40% for wav2vec2) to ensure fair comparison. The frontend was developed using Angular 19 with PrimeNG, while the backend includes a Flask server with SocketIO for WebSocket communication and FastAPI services for each model. The platform supports transcription through audio/video file uploads, YouTube URL input, and live browser recording, while providing real-time progress updates.
Experimental results showed that Whisper maintains consistent performance between 85% and 90% across different recording conditions, whereas wav2vec2 demonstrates excellent accuracy on clean audio with single speakers but degrades significantly in noisy environments or with multiple speakers. The optimizations implemented for handling Greek diacritics and morphological analysis proved crucial for the final transcription accuracy.


