In [1]:
import pandas as pd
import json
import plotly.express as px


In [2]:
def show_graph(df, metric, title):
    fig = px.line(df, x='epoch', y=metric, title=title)
    fig.show()


def log_to_pandas(path_to_log_file):
    with open(path_to_log_file, 'r') as f:
        log = json.load(f)
    log_df = pd.DataFrame([log]).T
    normalized = pd.json_normalize(log_df[0])
    log_df = normalized.reset_index().rename({'index': 'epoch'}, axis='columns')
    return log_df

## Training Results for 26-04-2022
I performed a 100 epoch training using the same parameters as the Zhou PPI training.

In [13]:
training_log = log_to_pandas(
    'results/train_log.json')
validation_log = log_to_pandas(
    'results/valid_log.json')
testing_log = log_to_pandas(
    'results/test_log.json')

In [14]:
show_graph(training_log, metric='loss', title='Training Loss')
show_graph(training_log, metric='acc', title='Training Accuracy')
show_graph(validation_log, metric='loss', title='Validation Loss')
show_graph(validation_log, metric='acc', title='Validation Accuracy')
show_graph(testing_log, metric='loss', title='Test Loss')

show_graph(testing_log, metric='acc', title='Test Accuracy')

### Result: Training accuracy decreases at epoch 29

In the above graphs for training 26-04-2022, show a marked decrease in training accuracy and increase in loss at Epoch 29.

Epoch 28 resulted in the best training, validation and test metrics, then suddenly a big drop-off in metrics.

Some suggestions on the cause and remedies can be found in this [stack overflow](https://stackoverflow.com/questions/53242875/accuracy-decreasing-with-higher-epochs) post. Most comments say this is a result of **overfitting**.

The post suggests the following remedies:

1. Reduce the learning rate to a very small number. I don't think this is the cause since my learning rate is already 1e-05.
2. Set dropout rates to 0.2. Keep them uniform across the network
3. Decreasing batch size. I have already decreased the batch size to 2. Perhaps 1?
4. Using the appropriate optimizer. Use different optimizers on the same network and select an optimizer which gives the least loss.
5. Provide more data

Of the suggestions, I think number 5 is the most likely cause of the overfitting, with only 16000 examples given for training.

The other suggestions are less likely since the model is capable of training the Zhou dataset constructed in the similar manner.

I am concerned since the same over-fitting pattern was observed in a previous training attempt then was corrected.

### Augmenting the Data
I think I should anyway add more data to the dataset. I must remember to log the process as closely as possible. I am forgetting a whole bunch of stuff again

### Model was saved
This model was saved on the external drive.