Πλατφόρμα συγκριτικής ανάλυσης μοντέλων ASR για την ελληνική γλώσσα : Whisper vs wav2vec2

Κυπραίος, Χαρίτων

Comparative analysis platform for ASR models in Greek language : Whisper vs wav2vec2

Master Thesis

Author

Κυπραίος, Χαρίτων

Date

2025-10

Abstract

This Master's thesis describes the development of a web-based comparison platform for two leading Automatic Speech Recognition (ASR) models in the Greek language: OpenAI's Whisper large-v3 and wav2vec2-greek (lighteternal/wav2vec2-large-xlsr-53-greek). Greek poses particular challenges for ASR systems due to its complex diacritical marking system, extensive morphological variation, and limited training data availability, making specialized evaluation of available solutions necessary. The implementation relies on a microservices architecture consisting of seven independent services, with programmable GPU resource allocation (60% for Whisper, 40% for wav2vec2) to ensure fair comparison. The frontend was developed using Angular 19 with PrimeNG, while the backend includes a Flask server with SocketIO for WebSocket communication and FastAPI services for each model. The platform supports transcription through audio/video file uploads, YouTube URL input, and live browser recording, while providing real-time progress updates. Experimental results showed that Whisper maintains consistent performance between 85% and 90% across different recording conditions, whereas wav2vec2 demonstrates excellent accuracy on clean audio with single speakers but degrades significantly in noisy environments or with multiple speakers. The optimizations implemented for handling Greek diacritics and morphological analysis proved crucial for the final transcription accuracy.

Postgraduate Studies Programme

Πληροφορική

Department

Σχολή Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τμήμα Πληροφορικής

Number of pages

Language

Greek

URI

https://dione.lib.unipi.gr/xmlui/handle/unipi/18501

Collections

Τμήμα Πληροφορικής

Show full item record

Except where otherwise noted, this item's license is described as
Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 3.0 Ελλάδα