# GPT-2 for music - By Dr. Tristan Behrens

This notebook shows you how to generate music with GPT-2

---

## Find me online

- https://www.linkedin.com/in/dr-tristan-behrens-734967a2/
- https://twitter.com/DrTBehrens
- https://github.com/AI-Guru
- https://huggingface.co/TristanBehrens
- https://huggingface.co/ai-guru


---

## Install depencencies.

The following cell sets up fluidsynth and pyfluidsynth on colaboratory.

In [1]:
if "google.colab" in str(get_ipython()):
    print("Installing dependencies...")
    #!pip uninstall -y bokeh
    !apt-get update -qq && apt-get install -qq  build-essential libasound2-dev libjack-dev && apt-get install libfluidsynth3
    !pip install -qU pyfluidsynth

    !pip install --upgrade bokeh==2.4.3

Installing dependencies...
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libinstpatch-1.0-2 timgm6mb-soundfont
Suggested packages:
  fluid-soundfont-gm
The following NEW packages will be installed:
  libfluidsynth3 libinstpatch-1.0-2 timgm6mb-soundfont
0 upgraded, 3 newly installed, 0 to remove and 24 not upgraded.
Need to get 5,913 kB of archives.
After this operation, 7,661 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libinstpatch-1.0-2 amd64 1.1.6-1 [240 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 timgm6mb-soundfont all 1.3-5 [5,427 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libfluidsynth3 amd64 2.2.5-1 [246 kB]
Fetched 5,913 kB in 1s (4,756 kB/s)
Selecting previously unselected package libinstpatch-1.0-2:amd64.
(Reading database ... 121654 files and directories currently installed.

In [2]:
!pip install transformers
!pip install note_seq

Collecting note_seq
  Downloading note_seq-0.0.5-py3-none-any.whl (209 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.4/209.4 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
Collecting intervaltree>=2.1.0 (from note_seq)
  Downloading intervaltree-3.1.0.tar.gz (32 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pretty-midi>=0.2.6 (from note_seq)
  Downloading pretty_midi-0.2.10.tar.gz (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting protobuf>=4.21.2 (from note_seq)
  Downloading protobuf-4.25.1-cp37-abi3-manylinux2014_x86_64.whl (294 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.6/294.6 kB[0m [31m32.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydub (from note_seq)
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Collecting mido>=1.1.16 (from pretty-midi>=0.2

## Load the tokenizer and the model from 🤗 Hub.

In [3]:
import os
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

In [16]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ai-guru/lakhclean_mmmtrack_4bars_d-2048")
model = AutoModelForCausalLM.from_pretrained("ai-guru/lakhclean_mmmtrack_4bars_d-2048")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## Convert the generated tokens to music that you can listen to.

This uses note_seq, which is something like MIDI coming from Google Magenta. You could even use it to load and save MIDI files. Check their repo if you want to learn more.


In [17]:
import note_seq

NOTE_LENGTH_16TH_120BPM = 0.25 * 60 / 120
BAR_LENGTH_120BPM = 4.0 * 60 / 120

def token_sequence_to_note_sequence(token_sequence, use_program=True, use_drums=True, instrument_mapper=None, only_piano=False):

    if isinstance(token_sequence, str):
        token_sequence = token_sequence.split()

    note_sequence = empty_note_sequence()

    # Render all notes.
    current_program = 1
    current_is_drum = False
    current_instrument = 0
    track_count = 0
    for token_index, token in enumerate(token_sequence):

        if token == "PIECE_START":
            pass
        elif token == "PIECE_END":
            print("The end.")
            break
        elif token == "TRACK_START":
            current_bar_index = 0
            track_count += 1
            pass
        elif token == "TRACK_END":
            pass
        elif token == "KEYS_START":
            pass
        elif token == "KEYS_END":
            pass
        elif token.startswith("KEY="):
            pass
        elif token.startswith("INST"):
            instrument = token.split("=")[-1]
            if instrument != "DRUMS" and use_program:
                if instrument_mapper is not None:
                    if instrument in instrument_mapper:
                        instrument = instrument_mapper[instrument]
                current_program = int(instrument)
                current_instrument = track_count
                current_is_drum = False
            if instrument == "DRUMS" and use_drums:
                current_instrument = 0
                current_program = 0
                current_is_drum = True
        elif token == "BAR_START":
            current_time = current_bar_index * BAR_LENGTH_120BPM
            current_notes = {}
        elif token == "BAR_END":
            current_bar_index += 1
            pass
        elif token.startswith("NOTE_ON"):
            pitch = int(token.split("=")[-1])
            note = note_sequence.notes.add()
            note.start_time = current_time
            note.end_time = current_time + 4 * NOTE_LENGTH_16TH_120BPM
            note.pitch = pitch
            note.instrument = current_instrument
            note.program = current_program
            note.velocity = 80
            note.is_drum = current_is_drum
            current_notes[pitch] = note
        elif token.startswith("NOTE_OFF"):
            pitch = int(token.split("=")[-1])
            if pitch in current_notes:
                note = current_notes[pitch]
                note.end_time = current_time
        elif token.startswith("TIME_DELTA"):
            delta = float(token.split("=")[-1]) * NOTE_LENGTH_16TH_120BPM
            current_time += delta
        elif token.startswith("DENSITY="):
            pass
        elif token == "[PAD]":
            pass
        else:
            #print(f"Ignored token {token}.")
            pass

    # Make the instruments right.
    instruments_drums = []
    for note in note_sequence.notes:
        pair = [note.program, note.is_drum]
        if pair not in instruments_drums:
            instruments_drums += [pair]
        note.instrument = instruments_drums.index(pair)

    if only_piano:
        for note in note_sequence.notes:
            if not note.is_drum:
                note.instrument = 0
                note.program = 0

    return note_sequence

def empty_note_sequence(qpm=120.0, total_time=0.0):
    note_sequence = note_seq.protobuf.music_pb2.NoteSequence()
    note_sequence.tempos.add().qpm = qpm
    note_sequence.ticks_per_quarter = note_seq.constants.STANDARD_PPQ
    note_sequence.total_time = total_time
    return note_sequence

## Generate music

This will generate one track of music and render it.

In [18]:
generated_sequence = "PIECE_START"



Note: Run the following cell multiple times to generate more tracks.

In [36]:
from google.colab import files
import time
timestr = time.strftime("%Y%m%d-%H%M%S")
fname = "mlmidi-"+timestr+".mid"

# Encode the conditioning tokens.
input_ids = tokenizer.encode(generated_sequence, return_tensors="pt")
#print(input_ids)

# Generate more tokens.
eos_token_id = tokenizer.encode("TRACK_END")[0]
temperature = 1.0
generated_ids = model.generate(
    input_ids,
    max_length=2048,
    do_sample=True,
    temperature=temperature,
    eos_token_id=eos_token_id,
)
generated_sequence = tokenizer.decode(generated_ids[0])
print(generated_sequence)

note_sequence = token_sequence_to_note_sequence(generated_sequence)

synth = note_seq.fluidsynth
note_seq.plot_sequence(note_sequence)
note_seq.play_sequence(note_sequence, synth)
note_seq.sequence_proto_to_midi_file(note_sequence, fname)
files.download(fname)

PIECE_START TRACK_START INST=81 DENSITY=2 BAR_START NOTE_ON=68 TIME_DELTA=4 NOTE_OFF=68 NOTE_ON=75 TIME_DELTA=4 NOTE_OFF=75 NOTE_ON=73 TIME_DELTA=4 NOTE_OFF=73 NOTE_ON=70 TIME_DELTA=3 NOTE_OFF=70 BAR_END BAR_START NOTE_ON=70 TIME_DELTA=3 NOTE_OFF=70 TIME_DELTA=1 NOTE_ON=68 TIME_DELTA=4 NOTE_OFF=68 NOTE_ON=68 TIME_DELTA=8 NOTE_OFF=68 BAR_END BAR_START BAR_END BAR_START TIME_DELTA=4 NOTE_ON=63 TIME_DELTA=1 NOTE_OFF=63 TIME_DELTA=1 NOTE_ON=63 TIME_DELTA=1 NOTE_OFF=63 TIME_DELTA=1 NOTE_ON=68 TIME_DELTA=1 NOTE_OFF=68 TIME_DELTA=1 NOTE_ON=68 TIME_DELTA=4 NOTE_OFF=68 BAR_END TRACK_END TRACK_START INST=81 DENSITY=2 BAR_START NOTE_ON=68 TIME_DELTA=4 NOTE_OFF=68 NOTE_ON=75 TIME_DELTA=4 NOTE_OFF=75 NOTE_ON=73 TIME_DELTA=4 NOTE_OFF=73 NOTE_ON=70 TIME_DELTA=3 NOTE_OFF=70 BAR_END BAR_START NOTE_ON=70 TIME_DELTA=3 NOTE_OFF=70 TIME_DELTA=1 NOTE_ON=68 TIME_DELTA=4 NOTE_OFF=68 NOTE_ON=68 TIME_DELTA=8 NOTE_OFF=68 BAR_END BAR_START BAR_END BAR_START TIME_DELTA=4 NOTE_ON=63 TIME_DELTA=1 NOTE_OFF=63 TIME_DE

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [29]:
print(note_sequence)

ticks_per_quarter: 220
tempos {
  qpm: 120.0
}
notes {
  pitch: 68
  velocity: 80
  end_time: 0.5
  program: 81
}
notes {
  pitch: 75
  velocity: 80
  start_time: 0.5
  end_time: 1.0
  program: 81
}
notes {
  pitch: 73
  velocity: 80
  start_time: 1.0
  end_time: 1.5
  program: 81
}
notes {
  pitch: 70
  velocity: 80
  start_time: 1.5
  end_time: 1.875
  program: 81
}
notes {
  pitch: 70
  velocity: 80
  start_time: 2.0
  end_time: 2.375
  program: 81
}
notes {
  pitch: 68
  velocity: 80
  start_time: 2.5
  end_time: 3.0
  program: 81
}
notes {
  pitch: 68
  velocity: 80
  start_time: 3.0
  end_time: 4.0
  program: 81
}
notes {
  pitch: 63
  velocity: 80
  start_time: 6.5
  end_time: 6.625
  program: 81
}
notes {
  pitch: 63
  velocity: 80
  start_time: 6.75
  end_time: 6.875
  program: 81
}
notes {
  pitch: 68
  velocity: 80
  start_time: 7.0
  end_time: 7.125
  program: 81
}
notes {
  pitch: 68
  velocity: 80
  start_time: 7.25
  end_time: 7.75
  program: 81
}
notes {
  pitch: 68
  v