Grekow J. - Emotional Regularization of the Latent Space of a Variational Autoencoder for Generating Musical Sequences

This article presents a new regularization of the latent space of a variational autoencoder that will facilitate the visual selection of specific emotions of generated monophonic musical sequences. This regularization allocates the regions of a two-dimensional space according to the quadrants of an emotion model. The cosine similarity between the vectors in the latent space and the feature vectors of the training examples was used to calculate the loss, which extends the standard loss of the variational autoencoder. By mapping the emotion model to a latent space, this space will become more interpretable when generating new examples of music. A two-dimensional Russell model was used as the emotion model, with its four quadrants corresponding to basic emotions, such as happy, angry, sad, and relaxed. Variational autoencoder models employing recurrent neural networks were constructed and trained on a dataset of monophonic music sequences labeled with emotions. The obtained latent space and generated music files with different emotions were evaluated.

Training dataset, code, and generated examples

Figure 1. Latent space of the baseline model CVAE-Base - without regularization

Figure 2. Latent space of the CVAE-EmoReg model - after regularization

Figure 3. Examples of generated musical sequences with given emotions (a) e1, (b) e2, (c) e3, and (d) e4

Table 1. Generated MIDI music examples labeled with four emotions using - CVAE-EmoReg model

Example	Emotion	Quarter in Russell's model / Arousal-Valence	MIDI
Example_1 (musical notation in Fig. 3a)	e1	Q1 / high-high
Example_2	e1	Q1 / high-high
Example_3	e1	Q1 / high-high
Example_4 (musical notation in Fig. 3b)	e2	Q2 / high-low
Example_5	e2	Q2 / high-low
Example_6	e2	Q2 / high-low
Example_7 (musical notation in Fig. 3c)	e3	Q3 / low-low
Example_8	e3	Q3 / low-low
Example_9	e3	Q3 / low-low
Example_10 (musical notation in Fig. 3d)	e4	Q4 / low-high
Example_11	e4	Q4 / low-high
Example_12	e4	Q4 / low-high