Computational models of multimodal interaction for music generation and music information retrieval
Υπολογιστικά μοντέλα πολυτροπικής διαδραστικότητας για μουσική δημιουργία και ανάκτηση μουσικής πληροφορίας
![Thumbnail](/xmlui/bitstream/handle/unipi/16332/Kritsis_pld1702.pdf.jpg?sequence=6&isAllowed=y)
Doctoral Thesis
Author
Kritsis, Kosmas
Κρίτσης, Κοσμάς
Date
2023Advisor
Pikrakis, AngelosΠικράκης, Άγγελος
View/ Open
Keywords
Autoregressive models ; Recurrent Neural Networks ; Convolutional Neural Networks ; Sequence generation ; Deep learning ; Curriculum learning ; Human-computer interaction ; Gesture recognition ; Human motion analysis ; Dance motion synthesis ; Long short-term memory ; Automatic music generationAbstract
In this dissertation, we explore multiple aspects of computational music generation and interaction, addressing tasks such as musical gesture recognition, virtual instrument interaction, audio-driven dance motion synthesis, jazz improvisation accompaniment generation, and symbolic music encodings. Throughout our various studies described here, we gain valuable insights into the capabilities and implications of different computational architectures, particularly recurrent and convolutional models. Our research begins by evaluating computational models for musical gesture recognition, with subsequent experimentation revealing the superiority of convolutional models, such as deep convolutional architectures, in terms of recognition accuracy and computation time. Building upon these findings, we develop a web-based system for real-time interaction with virtual musical instruments, incorporating both convolutional and recurrent architectures to enhance the user experience. We also explore audio-driven dance motion synthesis, where deep convolutional architectures incorporating a conditional autoencoder with dilated causal highway gates, outperform recurrent models in generating diverse and realistic dance motion sequences. Next, in the context of jazz improvisation, we focus on simulating the interplay between human soloists and artificial accompanists, highlighting the challenges and prospects of implicit machine learning approaches in modeling musical interactions. Lastly, we investigate the impact of symbolic music encodings on automatic music generation, emphasizing the importance of careful encoding design characteristics in shaping the resulting musical structure. Overall, our research provides valuable insights into the performance and potential of different computational architectures across various tasks in computational music generation and interaction. The successful integration of convolutional and recurrent models demonstrates their ability to model complex musical interactions. We emphasize the importance of selecting the appropriate computational architecture based on the task-specific goals and constraints. Our findings lay the foundation for future research, encouraging further exploration of advanced architectures, larger datasets, and more diverse tasks to continue pushing the boundaries of computational music generation and interaction. By doing so, we can unlock new possibilities for creative expression, human-computer collaborations, and the advancement of music technology.