#### Student Name:
#### Student ID:

# Assignment 7

### Mozart Dice Game RNN

Instructions: 

* This notebook is an interactive assignment; please read and follow the instructions in each cell. 

* Cells that require your input (in the form of code or written response) will have 'Question #' above.

* After completing the assignment, please submit this notebook and a copy as a PDF.

Please note that managing packages may take some time to set up. 
If you are using the UCSD Datahub, students have had success using a CPU instance, and installing packages in this order:

pip install music21

pip install --upgrade pip

pip install tensorflow

pip install --upgrade tensorflow


# Generating Music with RNN

In the next section, you will practice using Keras to create a generative model based on the music of your & your classmates' Mozart Dice Game from Assignment 1. 

You will be constructing an RNN by filling in some missing lines of code & answering questions about Keras and model performance. 

The overall goal of this model is to be able to predict the next note of a sequence, given a sequence of 4 notes. (This sequence length of 4 was chosen arbitrarily; please feel free to experiment with this number). 

First, let's define the RNN model we will use. 
A base LSTM layer has been included below.

##### Question 1 (30 points)

Define & compile the rest of the network as follows:

The additional layers of your network will be:

    1. Another LSTM layer, with 512 units of output which drops 3/10 of the units. 
    2. A batch normalization layer.
    3. A layer which drops 3/10 of the units. 
    4. A fully connected layer with 256 units of output.
    5. A ReLU activation layer.
    6. A batch normalization layer.
    7. A layer which drops 3/10 of the units. 
    8. A fully connected layer with number of units of output equal to the vocabulary space of the input. 
    9. A softmax activation layer which uses a temperature parameter. (We have provided this for you -- note that you must define this as two separate `layers' in Keras). 
    
After creating your network, compile the model with categorical cross entropy loss and an optimizer of your choice. 


In [2]:
import tensorflow as tf
print(tf.__version__)
from music21 import converter, instrument, note, chord, stream
import glob
import pickle
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.layers import Activation
from keras.layers import BatchNormalization as BatchNorm
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
from numpy.random import choice
from collections import OrderedDict


2.12.0


In [None]:
def create_network(network_input, n_vocab, temperature):

    model = Sequential()
    model.add(LSTM(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        recurrent_dropout=0.3,
        return_sequences=True
    ))
    
    ''' Your Code Here '''


    
    model.add(Lambda(lambda x: x/temperature))
    model.add(Activation('softmax'))
    
    model.compile(loss='CHANGE_ME', optimizer='CHANGE_ME')
    
    return model

Next, we will need to structure our input data in a way that makes sense. We can't pass a direct MIDI file to a network, so we must come up with an encoding. Read the code below:

In [None]:
def get_list_of_all_possible_notes(glob_query):
    '''
    Cycles through all MIDI files using the given glob_query, collecting every possible note (consisting of pitch and duration) found in each file.
    Creates an unordered list of the SET of these notes, dumps them into pickle file notes.p, and returns the list of notes.
    '''
    notes = OrderedDict()
    durations = []
    for file in glob.glob(glob_query): 
        midi = converter.parse(file)
        print("Parsing %s" % file)
        notes_to_parse = None
        try:
            s2 = instrument.partitionByInstrument(midi)
            notes_to_parse = s2.parts[0].recurse()
        except:
            notes_to_parse = midi.flat.notes
        for element in notes_to_parse:
            if isinstance(element, note.Note):
                note_str = str(element.duration) + "-" + str(element.pitch)
                if note_str not in notes:
                    notes[note_str] = len(notes) # use len(notes) as the value to ensure that each note has a unique index
            elif isinstance(element, chord.Chord):
                chord_str = str(element.duration) +"-"+ '.'.join(str(n) for n in element.normalOrder)
                if chord_str not in notes:
                    notes[chord_str] = len(notes)

    note_list = list(notes.keys())
    pickle.dump(note_list, open('notes.p', 'wb'))

    return note_list


def get_sequences(glob_query):
    '''
    Cycles through all MIDI files using the given glob_query, collecting every possible note (consisting of pitch and duration) found in each file.
    Creates a list of these notes in their sequence order, and returns this list.
    The output sequences are in "human" form (i.e. note name, duration)
    '''
    notes = []
    for file in glob.glob(glob_query):
        midi = converter.parse(file)
        #print("Parsing %s" % file)
        notes_to_parse = None #notes_to_parse = null value
        try: # file has instrument parts #try lets you test a block of code for errors
            s2 = instrument.partitionByInstrument(midi)
            notes_to_parse = s2.parts[0].recurse()
        except: # file has notes in a flat structure
            notes_to_parse = midi.flat.notes
        for element in notes_to_parse:
            if isinstance(element, note.Note):
                notes.append(str(element.duration) + "-" + str(element.pitch))
            elif isinstance(element, chord.Chord):
                notes.append(str(element.duration) +"-"+ '.'.join(str(n) for n in element.normalOrder))
    return notes

def prepare_sequences(sequence): 
    """ Prepare the sequences used by the Neural Network.
        Note that "sequence" is one giant sequence representing all of the input sequences, concatenated together as a "mega sequence".
        This is something that should be fixed in future iterations, as this will create some instances of "false transitions", having the last note of one piece jump to the first note of the next.
        It will work as 'ideally' intended only if trained on just one song. We will use this function as-is for the homework, but you are welcome to adjust it if you like.
        First, converts the sequences from human-readable form to integer form (i.e. "note, duration" is replaced by a numeric index)
        Then, converts to a one-hot encoding.
        Returns the input sequences and generated output.
    """
    input_sequence_length = 8

    # Form "note_to_int" lookup dictionary using notes.p, the file where we have stored a list of all the notes that appear in our vocabulary.
    pitchnames = pickle.load(open("notes.p", "rb"))

    #dictionary called note_to_int is created where the keys are the elements of the pitchnames list and the values are their indices obtained using the enumerate() function.
    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
    n_vocab = len(pitchnames)

    network_input = []
    network_output = []

    # The for loop iterates over a range of values obtained from the length of the notes list,
    # starting from 0 and incrementing by 1 for each iteration, until the last value that is len(notes) - sequence_length.
    # The sequence_length is an input parameter that determines the length of the input sequence.
    for i in range(0, len(sequence) - input_sequence_length, 1):

        # the sequence_in variable is assigned a slice of the notes list from the current index i to i + sequence_length.
        sequence_in = sequence[i:i + input_sequence_length]

        sequence_out = sequence[i + input_sequence_length]

        #appends to the network output to list the numerical value of the corresponding output note sequence_out in the note_to_int dictionary.
        network_input.append([note_to_int[char] for char in sequence_in])
        network_output.append(note_to_int[sequence_out])

    #returns length of network input
    n_patterns = len(network_input)

    # reshape the input into a format compatible with LSTM layers
    network_input = np.reshape(network_input, (n_patterns, input_sequence_length, 1))
    network_input = to_categorical(network_input, num_classes = n_vocab)

    network_output = to_categorical(network_output, num_classes = n_vocab)

    return (network_input, network_output)


vocabulary = get_list_of_all_possible_notes("dice_songs/*.mid")
sequence = get_sequences("dice_songs/*.mid")
network_input, network_output = prepare_sequences(sequence)

##### Question 2 (10 points)

How is the data from the MIDI file encoded as input to the network? Be specific in your explanation; make sure you address details such as which data type is used to represent a note in the input layer and how chords are handled, as well as what information is lost by using this encoding. 

[Hint: Try to print some of the variables to visualize their data.]

``` Your response here ```

Now, we are ready to train the network.

##### Question 3 (10 points)

Add a line of code to begin the training of the model.
Please train for at least 50 epochs (you are welcome to experiment with the duration of training, batch size, and other hyperparameters). 

In [None]:
def train_network():
    """ Train a Neural Network to generate music """
    
    vocabulary = get_list_of_all_possible_notes("dice_songs/*.mid")

    n_vocab = len(set(vocabulary))
    
    sequence = get_sequences("dice_songs/*.mid")
    network_input, network_output = prepare_sequences(sequence)
    
    model = create_network(network_input, n_vocab, 1)
 
    checkpoint = ModelCheckpoint(
        "weights2-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5",
        monitor='loss',
        verbose=0,
        save_best_only=True,
        mode='min'
    )
    
    callbacks_list = [checkpoint]

    # Your line of code here
    
train_network()

Now that we have a trained network to make predictions, it's time to use the network to generate music!

##### Question 4 (10 points)

To make the predictions, you will need to complete the line in the generate_notes function below.

[Hint: what function does Keras use to make predictions?]

In [None]:
def prepare_sequences_prediction(notes, pitchnames, n_vocab):

    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
    sequence_length = 4
    network_input = []
    output = []
    for i in range(0, len(notes) - sequence_length, 1):
        sequence_in = notes[i:i + sequence_length]
        sequence_out = notes[i + sequence_length]
        network_input.append([note_to_int[char] for char in sequence_in])
        output.append(note_to_int[sequence_out])

    n_patterns = len(network_input)

    # reshape the input into a format compatible with LSTM layers
    normalized_input = to_categorical(np.reshape(network_input, (n_patterns, sequence_length, 1)), num_classes = n_vocab)
    # normalize input
    
    return (network_input, normalized_input)

def generate_notes(model, network_input, pitchnames, n_vocab):
    """ Generate notes from the neural network based on a sequence of notes """
    # Starts the melody by picking a random sequence from the input as a starting point
    start = np.random.randint(0, len(network_input)-1)

    int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

    pattern = network_input[start]
    prediction_output = []

    for note_index in range(200):
        prediction_input = np.reshape(pattern, (1, len(pattern), 1))
        prediction_input = to_categorical(prediction_input, num_classes = n_vocab)
        
        ### Complete the line below
        prediction = ''' Your Code Here'''

        index = np.argmax(prediction)
        result = int_to_note[index]
        prediction_output.append(result)

        pattern.append(index)
        pattern = pattern[1:len(pattern)]

    return prediction_output

Now we have our model set up, and can create a sequence to use as a query for a prediction of the next note, but we aren't ready to make the predictions since our model does not contain the trained weights!

##### Question 5 (10 points)

Add a line below to load the weights from your network training. 

[Hint: What Keras function is used to load weights?]

In [None]:
def generate():
    notes = pickle.load(open('notes.p', 'rb'))
    pitchnames = sorted(set(item for item in notes))
    n_vocab = len(set(notes))

    network_input, normalized_input = prepare_sequences_prediction(notes, pitchnames, n_vocab)
    model = create_network(normalized_input, n_vocab, 1)
    
    ### Add a line to load the weights here
    
    
    
    prediction_output = generate_notes(model, network_input, pitchnames, n_vocab)
    create_midi(prediction_output)

In [None]:
def create_midi(prediction_output):
    offset = 0
    output_notes = []
    for pattern_outer in prediction_output:
        pattern = pattern_outer.split("-")[1]
        duration = float(pattern_outer.split("-")[0].split(" ")[1][:-1])
        if ('.' in pattern) or pattern.isdigit():
            notes_in_chord = pattern.split('.')
            notes = []
            for current_note in notes_in_chord:
                new_note = note.Note(int(current_note), quarterLength=duration)
                new_note.storedInstrument = instrument.Piano()
                notes.append(new_note)
            new_chord = chord.Chord(notes)
            new_chord.offset = offset
            output_notes.append(new_chord)
        else:
            new_note = note.Note(pattern, quarterLength =duration)
            new_note.offset = offset
            new_note.storedInstrument = instrument.Piano()
            output_notes.append(new_note)
        offset += duration #0.5
    midi_stream = stream.Stream(output_notes)
    midi_stream.write('midi', fp='test_output.mid')
    
generate()

##### Question 6 (10 points)

Listen to your MIDI output. You probably notice that at some point we reach a cycle. Why might this be happening? 

``` Your response here ```

##### Question 7 (10 points)

The generate_notes function is copied below. Please add your same prediction line from above once more, and then modify the generate_notes function in a way that allows for a non-cyclic composition that still resembles the original input. 

[Hint: think about what we learned in HW 2 while exploring Markov Chains with the Beatles.]

In [None]:
def generate_notes(model, network_input, pitchnames, n_vocab):
    """ Generate notes from the neural network based on a sequence of notes """
    # Starts the melody by picking a random sequence from the input as a starting point
    start = np.random.randint(0, len(network_input)-1)

    int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

    pattern = network_input[start]
    prediction_output = []

    for note_index in range(200):
        prediction_input = np.reshape(pattern, (1, len(pattern), 1))
        prediction_input = to_categorical(prediction_input, num_classes = n_vocab)

        
        ### Copy the line below from your above implementation.
        prediction = ''' Your Code Here'''

        index = np.argmax(prediction)
        result = int_to_note[index]
        prediction_output.append(result)

        pattern.append(index)
        pattern = pattern[1:len(pattern)]

    return prediction_output

generate()

##### Question 8 (5 points)

While we may have improved our ability to avoid getting stuck in cycles, there is more we can do to improve this! 
Think about your improvement made above. Describe (in general) the probability condition in which the method is least effective.



``` Your Response Here ```

##### Question 9 (5 points)

We can use temperature to help this further. Select a temperature parameter to replace the `1' below. Explore different parameters until you notice a difference in performance, then state your selected value and describe the difference (and why it occurs) below. 

In [None]:
def generate():
    notes = pickle.load(open('notes.p', 'rb'))
    pitchnames = sorted(set(item for item in notes))
    n_vocab = len(set(notes))

    network_input, normalized_input = prepare_sequences_prediction(notes, pitchnames, n_vocab)
    model = create_network(normalized_input, n_vocab, 1) # Modify the temperature parameter here!
    
    ### Add a line to load the weights here
    
    
    
    prediction_output = generate_notes(model, network_input, pitchnames, n_vocab)
    create_midi(prediction_output)
    
generate()

``` Your Response Here ```

##### Bonus Question 10 (10 points, but your total will not exceed 100)

There are many other ways in which this model could be improved for the goal of creating music that sounds like the training set. Identify two shortcomings of the model performance, and propose an idea you would use to overcome each of the shortcomings. 

``` Your response here ```