Meta-learning for few-shot audio classification problems
Τεχνικές μετα-μάθησης σε προβλήματα ταξινόμησης ήχου λίγων παραδειγμάτων
Bachelor Dissertation
Συγγραφέας
Marmaras, Georgios
Μαρμαράς, Γεώργιος
Ημερομηνία
2025-09Επιβλέπων
Pikrakis, AngelosΠικράκης, Άγγελος
Προβολή/ Άνοιγμα
Λέξεις κλειδιά
Few-shot learning ; Meta-learning ; Audio classification ; Environmental sound classification ; Model agnostic meta-learning ; Audio spectrogram transformer ; Prototypical networks ; Cross-domain generalization ; Episodic learning ; Modular frameworkΠερίληψη
As the demand for intelligent systems capable of fast generalization from minimal training data continues to grow, few-shot learning and meta-learning have emerged as key paradigms for data-efficient model adaptation. This work introduces a modular and extensible meta-learning framework for few-shot audio and image classification, designed to support reproducible experimentation and systematic evaluation across diverse datasets and neural architectures. It provides components for data handling, episodic task construction, neural network integration, adaptation routines, evaluation metrics management, checkpointing and visualization, thereby enabling flexible experimentation and rapid prototyping of meta-learning algorithms. The proposed framework is applied to environmental sound classification, where, gradient-based meta-learning methods, MAML and ProtoMAML, were evaluated on ESC-50, UrbanSound8K and FSC22 datasets under multiple N-way K-shot scenarios, with ProtoMAML consistently outperforming MAML by combining prototype-based initialization with gradient-based inner-loop adaptation. Meta-training significantly improved cross-dataset generalization, demonstrating the framework’s ability to facilitate rapid adaptation to unseen acoustic environments. Comparisons with a frozen pretrained Audio Spectrogram Transformer backbone paired with a Prototypical Network classifier showed that while pretrained embeddings provide strong baselines, explicit meta-learning remains essential for robust few-shot classification performance, particularly in extreme low-shot or high-way configurations.