Deep generation of electronic music

View/ Open
Keywords
GAN ; Generative Adversarial Network ; AI ; Deep learning ; Music generationAbstract
Τhe goal of this thesis is to explore the generation of electronic music through the utilization of Deep Learning techniques.
The challenge of algorithmically generating music lies in creating authentic and aesthetically pleasing compositions that resonate with listeners. Music is deeply human, rooted in emotion and culture, while algorithms lack this understanding. This project aims to bridge technology and art by providing musicians with tools to explore new expressions and spark creativity. By finding a balance between technology and human emotion, it seeks to enrich musical innovation and inspire new compositions.
To do that, the path we have chosen is by making our own dataset which consists of images that represent the original music. These images are called spectrograms and they are a 2D representation of a sound, essentially a graph, where the horizontal axis represents time and the vertical axis represents frequency. To address this challenge, Generative Adversarial Networks (GANs) are employed as the modeling approach. GANs are a class of deep learning algorithms that have shown promise in generating realistic images and, by extension, spectrograms. The methodology involves training DCGANs on the dataset of spectrograms to learn the underlying patterns and structures of electronic music.
Generative Adversarial Networks (GANs) consist of two networks, the Generator and the Discriminator, engaged in a competitive, zero-sum game. Known as minimax in game theory, each network aims to outperform the other: the Generator generates "fake" images, while the Discriminator discerns between "real" and "fake" ones. As the Generator seeks to deceive the Discriminator and the Discriminator tries not to be fooled, both networks improve, resulting in the generation of increasingly realistic images.
The methods under examination revolve around the training and optimization of Deep Convolutional Generative Adversarial Networks (DCGANs), which are essentially GANs whose both the Generator and the Discriminator have Convolutional layers in their architectures.
These spectrograms are then converted back into audio, which was the goal all along. To evaluate how realistic the generated sound is, a questionnaire was distributed for people to answer, where participants are asked to find which one of the songs they heard is AI-generated.
In conclusion, this thesis showcases the potential of DCGANs in the domain of electronic music generation. Despite the inherent limitations that GANs can have, our experimental results demonstrate that our models successfully generated music closely resembling the compositions in our dataset, achieving a realistic sound output.