MusicVAE: A hierarchical recurrent variational autoencoder for music.
MusicVAE learns a latent space of musical sequences, providing different modes of interactive musical creation, including:
- random sampling from the prior distribution,
- interpolation between existing sequences,
- manipulation of existing sequences via attribute vectors or a latent constraint model.
For short sequences (e.g., 2-bar "loops"), we use a bidirectional LSTM encoder and LSTM decoder. For longer sequences, we use a novel hierarchical LSTM decoder, which helps the model learn longer-term structures.
We also model the interdependencies among instruments by training multiple decoders on the lowest-level embeddings of the hierarchical decoder.
How To Use
Colab Notebook w/ Pre-trained Models
The easiest way to get started using a MusicVAE model is via our Colab Notebook. The notebook contains instructions for sampling interpolating, and manipulating musical sequences with pre-trained MusicVAEs for melodies, drums, and three-piece "trios" (melody, bass, drums) of varying lengths.
Generate script w/ Pre-trained Models
We provide a script in our pip package to generate outputs from the command-line.
Before you can generate outputs, you must either train your own model or download pre-trained checkpoints from the table below.
||16-bar "trios" (drums, melody, and bass)||download|
||2-bar drums w/ 9 classes trained for more realistic sampling||download|
||2-bar drums w/ 9 classes trained for better reconstruction and interpolation||download|
||2-bar drums w/ 61 classes||download|
||4-bar groove autoencoder.||download|
||2-bar model that converts a quantized, constant-velocity drum pattern into a "humanized" groove.||download|
||2-bar model that converts a constant-velocity single-drum "tap" pattern into a groove.||download|
||2-bar model that adds (or replaces) closed hi-hat for an existing groove.||download|
||2-bar groove autoencoder, with the input hits provided to the decoder as a conditioning signal.||download|
Once you have selected a model, there are two operations you can perform with
the generate script:
Sampling decodes random points in the latent space of the chosen model and
outputs the resulting sequences in
output_dir. Make sure you specify the
config for the model (see the table above).
cat-mel_2bar checkpoint above and run the following command to
generate 5 2-bar melody samples.
music_vae_generate \ --config=cat-mel_2bar_big \ --checkpoint_file=/path/to/music_vae/checkpoints/cat-mel_2bar_big.tar \ --mode=sample \ --num_outputs=5 \ --output_dir=/tmp/music_vae/generated
Perhaps the most impressive samples come from the 16-bar trio model. Download
hierdec-trio_16bar checkpoint above (warning: 2 GB) and run the following
command to generate 5 samples.
music_vae_generate \ --config=hierdec-trio_16bar \ --checkpoint_file=/path/to/music_vae/checkpoints/hierdec-trio_16bar.tar \ --mode=sample \ --num_outputs=5 \ --output_dir=/tmp/music_vae/generated
To interpolate, you need to have two MIDI files to inerpolate between. Each
model has ceratin constraints* for these files. For
example, the mel_2bar models only work if the input files are exactly 2-bars
long and contain monophonic non-drum sequences. The trio_16bar models require
16-bars with 3 instruments (based on program numbers): drums, piano or guitar,
num_outputs specifies how many points along the path connecting the
two inputs in latent space to decode, including the endpoints.
Try setting the inputs to be two of the samples you generated previously.
music_vae_generate \ --config=cat-mel_2bar_big \ --checkpoint_file=/path/to/music_vae/checkpoints/cat-mel_2bar.ckpt \ --mode=interpolate \ --num_outputs=5 \ --input_midi_1=/path/to/input/1.mid --input_midi_2=/path/to/input/2.mid --output_dir=/tmp/music_vae/generated
*Note: If you call the generate script with MIDI files that do not match the model's constraints, the script will try to extract any subsequences from the MIDI files that would be valid inputs, and write them to the output directory. You can then listen to the extracted subsequences, decide which two you wish to use as the ends of your interpolation, and then call the generate script again using these valid inputs. ↩
- Beat Blender by Google Creative Lab
- Melody Mixer by Google Creative Lab
- Latent Loops by Google Pie Shop
- Neural Drum Machine by Tero Parviainen
Learn more about the API in its repo.
Training Your Own MusicVAE
If you'd like to train a model on your own data, you will first need to set up
your Magenta environment. Next, convert a collection of MIDI files
into NoteSequences following the instructions in
Building your Dataset. You can then choose one of
the pre-defined Configurations in configs.py or define your own.
Finally, you must execute the training script. Below is an example
command, training the
cat-mel_2bar_small configuration and assuming your
examples are stored at
music_vae_train \ --config=cat-mel_2bar_small \ --run_dir=/tmp/music_vae/ \ --mode=train \ --examples_path=/tmp/music_vae/mel_train_examples.tfrecord
You will likely need to adjust some of the hyperparamters with the
flag for your particular train set and hardware. For example, if the default
batch size of a config is too large for your GPU, you can reduce the batch size
and learning rate by setting the flag as follows:
These models are particularly sensitive to the
hparams. Decreasing the effect of the KL loss (by increasing
max_beta) results in a model that produces better reconstructions,
but with potentially worse random samples. Increasing the effect of the KL loss
typically results in the opposite. The default config settings of these hparams
are an attempt to reach a balance between good sampling and reconstruction,
but the best settings are dataset-dependent and will likely need to be adjusted.
Finally, you should also launch an evaluation job (using
--mode=eval with a
heldout dataset) in order to compute metrics such as accuracy and to avoid