# PAR Transformer XL Trainer

This notebook is to facilitate training on Google colab, so that you can use a GPU/TPU.

## Colab specific setup

In [1]:
%%capture
!pip install tensorflow_text

In [2]:
!git clone https://github.com/Jmkernes/PAR-Transformer-XL.git
%cd PAR-Transformer-XL/

Cloning into 'PAR-Transformer-XL'...
remote: Enumerating objects: 263, done.[K
remote: Counting objects: 100% (263/263), done.[K
remote: Compressing objects: 100% (160/160), done.[K
remote: Total 263 (delta 130), reused 211 (delta 86), pack-reused 0[K
Receiving objects: 100% (263/263), 11.26 MiB | 23.49 MiB/s, done.
Resolving deltas: 100% (130/130), done.
/content/PAR-Transformer-XL


## Load tensorboard. Re-run this cell before every run to reload tensorboard.

This will setup the metric tracking. It's not required, as the code will print out the loss every 100 steps and print to a log file. But, this will tell you additional things like learning rate, perplexity and validation metrics.

In [4]:
!rm -r logs
!rm -r plots
!mkdir logs
%tensorboard --logdir logs

rm: cannot remove 'plots': No such file or directory


UsageError: Line magic function `%tensorboard` not found.


## Run the model

Adjust the parameters in the base_model script if you want to alter the model.

In [4]:
!./base_model.sh

=== Setting up configuration ===
=== Beginning training ===
2022-01-14 12:48:18,938 [INFO] 

~~~~~~~~ Importing Modules ~~~~~~~~

I0114 12:48:21.775961 140604169783168 train.py:100] 
Loading training data from: data/wikitext2_bsz32_seqlen32_tfrecords_train
Loading tokenizer from tokenizer/wiki2_12k.model...
2022-01-14 12:48:22.477174: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
Loading tfrecords from directory
I0114 12:48:22.962116 140604169783168 train.py:104] 
Loading validation data from: data/wikitext2_bsz32_seqlen32_tfrecords_valid
Loading tokenizer from tokenizer/wiki2_12k.model...
Loading tfrecords from directory
I0114 12:48:23.075972 140604169783168 train.py:108] 
Loading testing data from: data/wikitext2_bsz32_seqlen32_tfrecords_test

Loading tokenizer from tokenizer/wiki2_12k.model...
Loading tfrecords from directory
I0114 12:48:23.

In [6]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

In [7]:
!rm -rf ./logs/ 

In [8]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [12]:
!tensorboard dev upload \
  --logdir /content/drive/MyDrive/Study/NLP/PAR-Transformer-XL/logs/dmodel128_dffn512_blocks6/train\
  --name "(optional) My latest experiment" \
  --description "(optional) Simple comparison of several hyperparameters" \


Upload started and will continue reading any new data as it's added to the logdir.

To stop uploading, press Ctrl-C.

New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/efR95Ga5SYiBq6GIOnlQfA/

[1m[2022-01-14T13:23:37][0m Started scanning logdir.
[1m[2022-01-14T13:24:32][0m Total uploaded: 189259 scalars, 0 tensors, 0 binary objects
[2K[33mListening for new data in logdir...[0m

Interrupted. View your TensorBoard at https://tensorboard.dev/experiment/efR95Ga5SYiBq6GIOnlQfA/


## (Optional) Save results

The checkpoints file can be a lot of data, so it's advised to not zip the whole thing (which is why it's commented out) but just take which checkpoints you want.

If the code runs to completion (about 37m on a single GPU with default model settings), then you also have the option of downloading a .savedmodel file, which can be loaded into a fully functional model by executing ``` tf.keras.models.load_model('saved_models')```

In [5]:
from google.colab import files
!zip -r logs.zip logs
!zip -r plots.zip plots
!zip -r checkpoints.zip checkpoints

files.download('plots.zip')
files.download('logs.zip')

  adding: logs/ (stored 0%)
  adding: logs/dmodel128_dffn512_blocks6/ (stored 0%)
  adding: logs/dmodel128_dffn512_blocks6/train/ (stored 0%)
  adding: logs/dmodel128_dffn512_blocks6/train/events.out.tfevents.1642164041.0a061e494cad.135.0.v2 (deflated 75%)
  adding: logs/dmodel128_dffn512_blocks6/train/events.out.tfevents.1642164504.0a061e494cad.236.0.v2 (deflated 76%)
  adding: logs/dmodel128_dffn512_blocks6/test/ (stored 0%)
  adding: logs/dmodel128_dffn512_blocks6/test/events.out.tfevents.1642164504.0a061e494cad.236.1.v2 (deflated 67%)
  adding: logs/dmodel128_dffn512_blocks6/test/events.out.tfevents.1642164041.0a061e494cad.135.1.v2 (deflated 5%)
  adding: plots/ (stored 0%)
  adding: checkpoints/ (stored 0%)
  adding: checkpoints/train/ (stored 0%)
  adding: checkpoints/train/dmodel128_dffn512_blocks6/ (stored 0%)
  adding: checkpoints/train/dmodel128_dffn512_blocks6/ckpt-20.data-00000-of-00001 (deflated 18%)
  adding: checkpoints/train/dmodel128_dffn512_blocks6/ckpt-18.index (defl

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Interactive session

Since this is a notebook, you can load in different checkpoints of the model (or the final version) and play around with it.

In [None]:
import os
import json
import numpy as np
import tensorflow as tf
import tensorflow_text as tf_text
import matplotlib.pyplot as plt
from data_utils import DataManager
from utils import visualize_pi_weights
from par_model import PARTransformerXL
from par_model import create_lookahead_mask, positional_encoding

The below will work to load in from checkpoint. You have to 

1) recreate an identical model with the same architecture

2) create a checkpoint object with parameter model=model. The key here was decided when the first model was checkpointed, i.e., that the model should always be called model.

3) restore the checkpoint object with a checkpoint path ckpt.restore(PATH). This will automatically change the value of model globally, i.e. ckpt doesn't keep a copy of model, it keeps a reference.

All that is in the load_from_checkpoint path. Have fun!

In [None]:
def load_from_checkpoint(ckpt_path):
    with open(ckpt_path+'/config.json', 'r') as file:
        config = json.loads(file.readline())
    model = PARTransformerXL(**config)
    ckpt = tf.train.Checkpoint(model=model)
    ckpt_manager = tf.train.CheckpointManager(ckpt, ckpt_path, 5)
    ckpt.restore(ckpt_manager.latest_checkpoint)
    return model

def load_from_savedmodel(path):
    return tf.keras.models.load_model(path)