<a href="https://colab.research.google.com/github/wangcongcong123/ttt/blob/master/ttt_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## TTT: Fine-tuning Transformers with TPUs or GPUs acceleration, written in Tensorflow2.0+

TTT is a package for fine-tuning **T**ransformers with **T**PUs, written in **T**ensorflow2.0+. It is motivated to be completed due to bugs I found tricky to solve when using the [xla library](https://github.com/pytorch/xla) with PyTorch. As a newcomer to the TF world, I am humble to learn more from the community and hence it is open sourced [here]((https://github.com/wangcongcong123/ttt)). 

This noteboook guides to train transformers using the ttt [library](https://github.com/wangcongcong123/ttt) in two ways:
1. Train with code controls to customize your dataset and configure model parameters for fine-tuning
2. Run direct-to-go commands to fine-tune a transformer with single sequence-based classification datasets (you can explore the [nlp viewer](https://huggingface.co/nlp/viewer/) to have a sense of what datasets fit to this feature)

## Prepare

In [None]:
!git clone https://github.com/wangcongcong123/ttt.git
%cd ttt
!pip install -e .

Cloning into 'ttt'...
remote: Enumerating objects: 28, done.[K
remote: Counting objects: 100% (28/28), done.[K
remote: Compressing objects: 100% (26/26), done.[K
remote: Total 28 (delta 1), reused 28 (delta 1), pack-reused 0[K
Unpacking objects: 100% (28/28), done.
/content/ttt
Obtaining file:///content/ttt
Collecting tensorboardX
[?25l  Downloading https://files.pythonhosted.org/packages/af/0c/4f41bcd45db376e6fe5c619c01100e9b7531c55791b7244815bac6eac32c/tensorboardX-2.1-py2.py3-none-any.whl (308kB)
[K     |████████████████████████████████| 317kB 3.4MB/s 
[?25hCollecting nlp
[?25l  Downloading https://files.pythonhosted.org/packages/09/e3/bcdc59f3434b224040c1047769c47b82705feca2b89ebbc28311e3764782/nlp-0.4.0-py3-none-any.whl (1.7MB)
[K     |████████████████████████████████| 1.7MB 11.0MB/s 
[?25hCollecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/ae/05/c8c55b600308dc04e95100dc8ad8a244dd800fe75dfafcf1d6348c6f6209/transformers-3.1.0-py3-none-any.w

## Train on TPU

### Make sure in TPU environment

* On the main menu, click Runtime >> select **Change runtime type**. Set "TPU" as the hardware accelerator.

In [None]:
# make sure the right thing is done
import os
assert os.environ['COLAB_TPU_ADDR'], 'Make sure to select TPU from Edit > Notebook settings > Hardware accelerator'

In [None]:
import tensorflow as tf
print("Tensorflow version " + tf.__version__)

try:
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
  print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
  raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')

TPU_ADDRESS=tpu.cluster_spec().as_dict()['worker'][0].split(":")[0]
TPU_ADDRESS

Tensorflow version 2.3.0
Running on TPU  ['10.86.132.2:8470']


'10.86.132.2'

### Start fine-tuning using T5-small
- the following uses SST2 as the dataset
- try others: replace `args.data_path="data/glue/sst2"` with `"data/20newsgroup"`, `"data/ag_news"`, `"data/imdb"`, or `"data/sentiment140"`.

In [None]:
from ttt import *

In [None]:
def create_model(args, logger, model_getter):
    if args.use_tpu:
        # Create distribution strategy
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + args.tpu_address)
        tf.config.experimental_connect_to_cluster(tpu)
        tf.tpu.experimental.initialize_tpu_system(tpu)
        logger.info("All TPU devices: ")
        for each_device in tf.config.list_logical_devices('TPU'):
            logger.info(each_device)
        strategy = tf.distribute.TPUStrategy(tpu)
        # Create model
        with strategy.scope():
            model = model_getter(args)
    else:
        if args.use_gpu:
            # Create a MirroredStrategy.
            strategy = tf.distribute.MirroredStrategy()
            logger.info("Number of GPU devices: {}".format(strategy.num_replicas_in_sync))
            # Open a strategy scope.
            with strategy.scope():
                model = model_getter(args)
        else:
            raise ValueError("not available yet")
    logger.info(model.summary())
    args.num_replicas_in_sync = strategy.num_replicas_in_sync
    write_args(args.output_path, args)
    return model, strategy

def run():
  args = Args()
  # check what args are available
  logger.info(f"args: {json.dumps(args.__dict__, indent=2)}")
  ############### customize args
  # args.use_gpu = True # if use_gpu, make sure you in a GPU environment first.
  args.use_tpu = True
  args.do_train = True
  args.use_tb = True
  # any one from MODELS_SUPPORT (check:ttt/args.py)
  args.model_select = "t5-small"
  # select a dataset. First check if  it is from nlp, if yes load it here and save locally to the data_path
  # or customize a data in the data_path (train.json, val.json, test.json) where examples are organised in jsonl format
  # each line represents an example like this: {"text": "...", "label","..."}
  args.data_path = "data/glue/sst2"
  # any one from TASKS_SUPPORT (check:ttt/args.py)
  args.task = "t2t"
  args.log_steps = 400
  # any one from LR_SCHEDULER_SUPPORT (check:ttt/args.py)
  args.scheduler = "warmuplinear"
  # set do_eval = False if your data does not contain a validation set. In that case, patience, and early_stop will be invalid
  args.do_eval = True
  args.tpu_address = TPU_ADDRESS
  ############### end customize args
  # to have a sanity check for the args
  sanity_check(args)
  # seed everything, make deterministic
  set_seed(args.seed)
  tokenizer = get_tokenizer(args)
  inputs = get_inputs(tokenizer, args)
  model, strategy = create_model(args, logger, get_model)
  # start training, here we customize T2TTrainer to get more control and flexibility
  trainer = T2TTrainer(args)
  trainer.train(model, strategy, tokenizer, inputs)


In [None]:
run()

-  the fine-tuning takes every 400 steps to evaluate on validation set for model weights selection and saving
- this may take a while to be finished. Go grab a cuppa and enjoy the game.
- after training, you will find all training details and model weights are saved to `tmp/t5-small_t2t_glue-sst2`

### Run with commands
- remeber to use `--tpu_address` (it is `10.86.132.2` in this case)
- this may take a while to be finished. Go grab a cuppa and enjoy the game.

In [None]:
!python3 run.py --model_select t5-small --data_path data/glue/sst2 --task t2t --per_device_train_batch_size 8 --num_epochs_train 6 --max_seq_length 128 --lr 5e-5 --schedule warmuplinear --do_train --do_eval --do_test --use_tpu --tpu_address 10.86.132.2



evaluating...:  25% 1/4 [00:05<00:16,  5.55s/it][A[A

evaluating...:  50% 2/4 [00:10<00:10,  5.34s/it][A[A

evaluating...:  75% 3/4 [00:15<00:05,  5.20s/it][A[A

evaluating...: 100% 4/4 [00:17<00:00,  4.48s/it]
2020-09-03 21:49:31.793 INFO t2t_trainer - evaluate: 

2020-09-03 21:49:31.793 INFO t2t_trainer - evaluate: *******eval at global_step = 800 on validation dataset*********
2020-09-03 21:49:31.793 INFO t2t_trainer - evaluate: val_acc: 0.8727064220183486
2020-09-03 21:49:31.803 INFO t2t_trainer - evaluate: val_cls_report:               precision    recall  f1-score   support

    negative     0.8578    0.8879    0.8726       428
    positive     0.8881    0.8581    0.8729       444

    accuracy                         0.8727       872
   macro avg     0.8729    0.8730    0.8727       872
weighted avg     0.8732    0.8727    0.8727       872

2020-09-03 21:49:31.803 INFO t2t_trainer - evaluate: so far the best check point at global_step=800 based on eval_on acc
2020-09-03 