# Undertale & Deltarune Soundtrack Generator

---

## Table of Contents

0. [**Table of Contents**](#Table-of-Contents)

1. [**Imports**](#Imports)

2. [**Data Processing**](#Data-Processing)

    2.1 [Data Loading](#Data-Loading)
    
    2.2 [Data Preprocessing](#Data-Preprocessing)
    
    2.3 [Dataset Definition](#Dataset-Definition)
    
3. [**Model Definition**](#Model-Definition)
    
4. [**Hyperparameters & Instantiation**](#Hyperparameters-&-Instantiation)

5. [**Training**](#Training)
    
    4.1 [Training Function](#Training-Function)
    
    4.2 [Training Session](#Training-Session)

6. [**Saving Trained Model**](#Saving-Trained-Model)

7. [**Generation**](#Generation)

    6.1 [Generation Function](#Generation-Function)
    
    6.2 [Sampling Function](#Sampling-Function)
    
    6.3 [Music Generation](#Music-Generation)

8. [**Final Summary, Notes, and Thoughts**](#Final-Summary,-Notes,-and-Thoughts)

---

## Imports
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

Import required packages:

    * os (for file handling)
    
    * itertools (chain() for merging lists)
    
    * collections (useful tools like Counter, OrderedDict)
    
    * random (for sequence shuffling)
    
    * tqdm (progress bar)

    * PyTorch (Deep Learning Framework)
    
    * Matplotlib (Plotting)

In [1]:
import os
import itertools
import random
import collections

import tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

import matplotlib.pyplot as plt
%matplotlib inline

---

## Data Processing
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

### Data Loading
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

Read the text files in the target directory.

Do some processing to make sure the texts are clean.

In [2]:
def get_texts(texts_dir):

    if not os.path.isdir(texts_dir):
        raise FileNotFoundError("given text directory not found: {}".format(texts_dir))

    texts = []
    
    for text_path in (file.path for file in os.scandir(texts_dir) if file.is_file() and file.name.endswith(".txt")):
        with open(file=text_path, mode='r', encoding="utf-8") as text_file:
            
            text = text_file.read().strip()

            if not text.replace(' ', '').isdigit():
                raise RuntimeError("one or more characters other than digits and white spaces are detected: {}".format(text_path))

            while "  " in text:
                text = text.replace("  ", ' ')
            
            texts.append((text_path, text))
    
    return dict(texts)


[(os.path.split(text_path)[1], text[:20]) for text_path, text in get_texts("./source/converted_texts").items()]

[('ANOTHER_HIM_-_DeltaRune.txt', '42 46 49 53 0 42 46 '),
 ('A_Town_Called_Hometown_Deltarune_-_Arranged_for_Piano.txt',
  '73 89 0 73 89 0 73 8'),
 ('Basement_Deltarune_-_Arranged_for_Piano.txt', '39 51 0 39 51 0 39 5'),
 ('Before_the_Story_Deltarune_-_Arranged_for_piano_.txt',
  '48 0 48 0 48 0 48 0 '),
 ('Card_Castle_Deltarune_-_Arranged_for_Piano.txt', '39 0 39 0 39 0 39 0 '),
 ('Checker_Dance_Deltarune_-_Arranged_for_Piano.txt', '30 0 30 0 30 0 30 0 '),
 ('Deltarune_-_Beginning.txt', '48 55 0 48 55 0 48 5'),
 ('Deltarune_-_Chaos_King.txt', '27 39 0 27 39 0 27 3'),
 ('Deltarune_-_Darkness_Falls.txt', '61 64 71 75 0 61 64 '),
 ('Deltarune_-_Dont_Forget_Ending_Theme_Solo_Piano_Version.txt',
  '77 0 77 0 77 0 77 0 '),
 ('Deltarune_-_Friendship.txt', '74 0 74 0 74 0 74 0 '),
 ('Deltarune_-_Gallery.txt', '32 36 39 68 0 32 36 '),
 ('Deltarune_-_Lancer_Battle.txt', '62 0 62 0 62 0 0 0 0'),
 ('DELTARUNE_-_Lancer_piano_solo.txt', '0 0 0 62 0 62 0 62 0'),
 ('Deltarune_-_Lantern.txt', '49 0 4

### Data Preprocessing
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

Get integers out of the text and make lists of ints.

These lists can be used for the input of the models, or be further processed to compress or simplify the sequences.

In [3]:
def texts_to_intlists(text_list):
    
    intlists = []
    
    for i, text in enumerate(iterable=text_list):
        
        int_strings = text.split(' ')
        
        if not all(int_str.isdigit() for int_str in int_strings):
            raise RuntimeError("non-digit string detected in text {}".format(i))

        ints = [int(int_str) for int_str in int_strings]
        
        intlists.append(ints)
        
    return intlists


print([ints[:10] for ints in texts_to_intlists(get_texts("./source/converted_texts").values())])

[[42, 46, 49, 53, 0, 42, 46, 49, 53, 0], [73, 89, 0, 73, 89, 0, 73, 89, 0, 73], [39, 51, 0, 39, 51, 0, 39, 51, 0, 39], [48, 0, 48, 0, 48, 0, 48, 0, 48, 0], [39, 0, 39, 0, 39, 0, 39, 0, 39, 0], [30, 0, 30, 0, 30, 0, 30, 0, 30, 0], [48, 55, 0, 48, 55, 0, 48, 55, 0, 48], [27, 39, 0, 27, 39, 0, 27, 39, 0, 27], [61, 64, 71, 75, 0, 61, 64, 71, 75, 0], [77, 0, 77, 0, 77, 0, 77, 0, 77, 0], [74, 0, 74, 0, 74, 0, 74, 0, 74, 0], [32, 36, 39, 68, 0, 32, 36, 39, 68, 0], [62, 0, 62, 0, 62, 0, 0, 0, 0, 65], [0, 0, 0, 62, 0, 62, 0, 62, 0, 62], [49, 0, 49, 0, 49, 0, 49, 0, 49, 0], [31, 43, 0, 31, 43, 0, 31, 43, 0, 31], [24, 31, 0, 24, 31, 0, 24, 31, 0, 24], [45, 57, 0, 45, 57, 0, 45, 57, 0, 45], [39, 0, 39, 0, 39, 0, 39, 0, 39, 0], [46, 0, 46, 0, 46, 0, 46, 0, 46, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [58, 70, 0, 58, 70, 0, 58, 70, 0, 58], [37, 49, 0, 37, 49, 0, 37, 49, 0, 37], [44, 68, 0, 44, 68, 0, 44, 68, 0, 44], [67, 0, 67, 0, 67, 0, 67, 0, 67, 0], [61, 0, 61, 0, 61, 0, 61, 0, 61, 0], [49, 0, 49, 0, 

To use "words" as the input and output instead of "characters",

consider '0's as spaces and find all existing words in the texts.

(Here, each word becomes a "token")

We can also tokenize the duration of each word to reduce the

repetition of words that appear several times in a row.

In [4]:
def tokenize(intlists, max_repeat_encoding=0):
    assert isinstance(max_repeat_encoding, int) and max_repeat_encoding >= -1 # -1 for no limit
    
    encode_repetition = (max_repeat_encoding != 0)
    if encode_repetition:
        observed_repeats = []
    
    counter = collections.Counter() # Note: repetition tokens are not counted. They are appended to the dictionary later.
    tokenized_lists = []
    
    for intlist in intlists:
        if encode_repetition:
            last_token = None
            n_repeats = 0
        token = []
        tokenized = []
        for int_val in intlist:
            if int_val != 0:
                token.append(int_val)
            else:
                token = tuple(sorted(token))
                
                if encode_repetition:
                    if last_token == token:
                        if n_repeats == max_repeat_encoding:
                            tokenized.append(("<REPEAT>", n_repeats))
                            n_repeats = 1
                        else:
                            n_repeats += 1
                            if n_repeats not in observed_repeats:
                                observed_repeats.append(n_repeats)
                    else:
                        if n_repeats != 0:
                            tokenized.append(("<REPEAT>", n_repeats))
                        counter.update((token,))
                        tokenized.append(token)
                        last_token = token
                        n_repeats = 0

                else:
                    counter.update((token,))
                    tokenized.append(token)
                token = []
        tokenized_lists.append(tokenized)
    
    tokens_token_to_idx = collections.OrderedDict((token_key, i) for i, (token_key, _) in enumerate(counter.most_common()))
    if encode_repetition:
        tokens_token_to_idx.update([(("<REPEAT>", r), i) for i, r in enumerate(observed_repeats, len(tokens_token_to_idx))])
    tokens_idx_to_token = collections.OrderedDict((i, token_key) for token_key, i in tokens_token_to_idx.items())
    print(len(tokens_idx_to_token), "tokens")
    
    for tokenized in tokenized_lists:
        for i, token_key in enumerate(tokenized):
            tokenized[i] = tokens_token_to_idx[token_key]

    return tokenized_lists, tokens_idx_to_token

max_repeats = 15

intlists = texts_to_intlists(get_texts("./source/converted_texts").values())
tokenized_lists, tokens_idx_to_token = tokenize(intlists, max_repeat_encoding=max_repeats)
print("\nPart of tokenized sequences:")
print([tokenized_list[:10] for tokenized_list in tokenized_lists[:10]])
print("\nOriginal lengths:")
print([len(intlist) for intlist in intlists[:10]])
print("\nTokenized lengths (with maximum repetition of {}):".format("0 (no repetition tokens)" if max_repeats == 0
                                                                    else "infinity (unlimited length)" if max_repeats == -1
                                                                    else max_repeats))
print([len(tokenized_list) for tokenized_list in tokenized_lists[:10]])
print("\nSome of the most frequent tokens + repetition tokens if used:")
print(list(tokens_idx_to_token.items())[:5] + list(tokens_idx_to_token.items())[-5:])

7536 tokens

Part of tokenized sequences:
[[764, 7535, 7535, 7535, 7535, 7535, 7535, 7535, 7535, 7535], [2871, 7535, 1336, 7535, 991, 1336, 7533, 262, 7521, 2872], [59, 7534, 0, 59, 7535, 7535, 7535, 7535, 7522, 2888], [25, 7535, 7521, 0, 50, 7535, 7521, 12, 7535, 7535], [23, 7524, 0, 7523, 23, 7524, 0, 7535, 7527, 121], [52, 7529, 0, 52, 7522, 0, 7535, 7535, 7535, 7535], [221, 7524, 0, 7522, 221, 7526, 0, 221, 7525, 0], [78, 7530, 0, 78, 7524, 0, 7525, 78, 7525, 0], [622, 7532, 166, 7526, 622, 7525, 166, 1374, 7528, 166], [16, 7535, 7533, 0, 7521, 26, 7535, 7533, 0, 1009]]

Original lengths:
[9070, 15651, 4602, 13185, 8139, 10348, 7383, 30509, 10821, 6219]

Tokenized lengths (with maximum repetition of 15):
[246, 892, 119, 641, 673, 1451, 719, 4066, 684, 307]

Some of the most frequent tokens + repetition tokens if used:
[(0, ()), (1, (44,)), (2, (69,)), (3, (66,)), (4, (63,)), (7531, ('<REPEAT>', 11)), (7532, ('<REPEAT>', 12)), (7533, ('<REPEAT>', 13)), (7534, ('<REPEAT>', 14)), (753

As you try out different *'max_repeat_encoding'* values \[0, 5, 10, 20, -1\] (-1 for unlimited repetition length)

you should observe great reductions in sequence lengths when using repetition encodings.

### Dataset Definition
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

Create a Dataset class from which training data can be sampled.

This Dataset should convert the encoded sequence above into tensors

and have a method for shuffling the order of multiple sequences while

leaving the patterns inside of each sequence untouched.

In [12]:
class UndertaleDeltaruneDataset(Dataset):
    def __init__(self, texts_dir, batch_size, max_repeats):
        self.texts = get_texts(texts_dir) # read and get a dictionary of {file_paths: text_contents}
        self.sequences, self.tokens = tokenize(texts_to_intlists((self.texts.values())), max_repeat_encoding=max_repeats) # convert and tokenize

        self.texts_dir = texts_dir
        self.batch_size = batch_size

    def __len__(self):
        return self.batch_size

    def data_len(self):
        return sum([len(sequence) for sequence in self.sequences])

    def __getitem__(self, index):
        shuffled_list = list(itertools.chain(random.sample(self.sequences, len(self.sequences))))
        return torch.LongTensor(shuffled_list[:-1]), torch.LongTensor(shuffled_list[1:])

---

## Model Definition
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

Define the model architecture.

The input is indices of the tokens, and the output is the negative Euclidean distance between the output of the final fully-connected layer and each embedding vectors of the embedding layer.

In [13]:
class UDNet(nn.Module):
    """Undertale-Deltarune Network"""
    def __init__(self, n_tokens, embedding_dim, hidden_dim, dropout=0., batch_first=True):
        super(UDNet, self).__init__()
        
        # Stored variables
        self.n_tokens = n_tokens
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.dropout = dropout
        self.batch_first = batch_first
        
        # Overall architecture
        self.embed = nn.Embedding(num_embeddings=n_tokens,      embedding_dim=embedding_dim)
        self.lstm0 = nn.LSTM(     input_size    =embedding_dim, hidden_size  =hidden_dim,   batch_first=batch_first)
        self.ln0   = nn.LayerNorm(hidden_dim)
        self.lstm1 = nn.LSTM(     input_size    =hidden_dim,    hidden_size  =hidden_dim,   batch_first=batch_first)
        self.ln1   = nn.LayerNorm(hidden_dim)
        self.lstm2 = nn.LSTM(     input_size    =hidden_dim,    hidden_size  =hidden_dim,   batch_first=batch_first)
        self.ln2   = nn.LayerNorm(hidden_dim)
        self.fc    = nn.Linear(   in_features   =hidden_dim,    out_features =embedding_dim)
        
        # Parameterized initial hidden(hidden, cell) states
        self.hidden0_0 = nn.Parameter(torch.zeros(hidden_dim))
        self.  cell0_0 = nn.Parameter(torch.zeros(hidden_dim))
        self.hidden0_1 = nn.Parameter(torch.zeros(hidden_dim))
        self.  cell0_1 = nn.Parameter(torch.zeros(hidden_dim))
        self.hidden0_2 = nn.Parameter(torch.zeros(hidden_dim))
        self.  cell0_2 = nn.Parameter(torch.zeros(hidden_dim))
        
        # Dropout and Activation layers
        self.dropout_layer     = nn.Dropout(p=dropout)
        self.hidden_activation = nn.ReLU()

    def forward(self, x, hidden_states):
        hiddens, cells = hidden_states
        hiddens, cells = list(hiddens), list(cells)

        shortcut =        x       = self.embed(x)
        x, (hiddens[0], cells[0]) = self.lstm0(         self.dropout_layer(x), (hiddens[0], cells[0]))
        shortcut, x               = x, shortcut + x
        x, (hiddens[1], cells[1]) = self.lstm1(self.ln0(self.dropout_layer(x)), (hiddens[1], cells[1]))
        shortcut, x               = x, shortcut + x
        x, (hiddens[2], cells[2]) = self.lstm2(self.ln1(self.dropout_layer(x)), (hiddens[2], cells[2]))
        x                         =    shortcut + x
        x                         = self.fc(   self.ln2(self.dropout_layer(x)))
        # Euclidean distance in the embedding space
        x                         = (x.unsqueeze(-2) - self.embed.weights).pow(2).sum(dim=-1, keepdim=False).sqrt()
        
        x = x.neg()
        
        return x, (tuple(hiddens), tuple(cells))

    def init_hidden(self, cuda=None):
        if cuda is None:
            device = self.hidden0_0.device
        else:
            device = torch.device('cuda') if cuda else torch.device('cpu')
        
        hiddens = (
            self.hidden0_0.to(device),
            self.hidden0_1.to(device),
            self.hidden0_2.to(device)
        )

        cells = (
            self.cell0_0.to(device),
            self.cell0_1.to(device),
            self.cell0_2.to(device)
        )
        
        return (hiddens, cells)

## Hyperparameters & Instantiation
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

Set hyperparameters and instantiate a dataset and a model.

In [14]:
max_repeats = 15

embedding_dim = 128
hidden_dim = 256
dropout=0.2

ud_dataset = UndertaleDeltaruneDataset("./source/converted_texts", 1, max_repeats)

model = UDNet(len(ud_dataset.tokens), embedding_dim, hidden_dim, dropout)

print(ud_dataset.data_len())
print(model)

7536 tokens
88709
UDNet(
  (embed): Embedding(7536, 128)
  (lstm0): LSTM(128, 256, batch_first=True)
  (ln0): LayerNorm(torch.Size([256]), eps=1e-05, elementwise_affine=True)
  (lstm1): LSTM(256, 256, batch_first=True)
  (ln1): LayerNorm(torch.Size([256]), eps=1e-05, elementwise_affine=True)
  (lstm2): LSTM(256, 256, batch_first=True)
  (ln2): LayerNorm(torch.Size([256]), eps=1e-05, elementwise_affine=True)
  (fc): Linear(in_features=256, out_features=128, bias=True)
  (dropout_layer): Dropout(p=0.2)
  (hidden_activation): ReLU()
)


---

## Training
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

### Training Function
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

### Training Session
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

---

## Saving Trained Model
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

---

## Generation
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

### Generation Function
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

### Sampling Function
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

### Music Generation
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

---

## Final Summary, Notes, and Thoughts
[(go to top)](#Undertale-&-Deltarune-Soundtrack-Generator)

---