# Resources
- Transfomer Adapter example Colab notebooks: https://github.com/Adapter-Hub/adapter-transformers/tree/master/notebooks


# Setup

**TODO:** Create a directory in your google drive called `dl-group-project`. In it,
- Upload the contents of the project repo `dl-group-project`
- Upload the unzipped `data`

## Mount Google Drive

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Now we will change directories to the `dl-group-project` folder

**TODO:** Change the below `cd` statement to match the path in your Google Drive to your `dl-group-project` directory

In [2]:
%cd /content/drive/MyDrive/School/semesters/spring_2022/CS_7643_DL/dl-group-project
%ls

/content/drive/MyDrive/School/semesters/spring_2022/CS_7643_DL/dl-group-project
[0m[01;34mcheckpoints[0m/    modelT5Adapter.py    requirements.txt       train_adapter.ipynb
[01;34mdata[0m/           modelT5.py           testing_adapter.ipynb  [01;34mtraining_output[0m/
dataprep.ipynb  PT_T5_adapter.ipynb  testing.ipynb          train.ipynb
[01;34mdataprovider[0m/   PT_T5.ipynb          testing.txt
data.zip        [01;34m__pycache__[0m/         TF_T5.ipynb
[01;34mexamples[0m/       README.md            tf_test.ipynb


## Install necessary packages

In [3]:
!pip install transformers
!pip install -U adapter-transformers
!pip install datasets
!pip install pytorch_lightning
!pip install sentencepiece



## Check GPU

In [4]:
!nvidia-smi

Wed Apr 27 20:19:32 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   55C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Imports

## Imports

In [5]:
%load_ext autoreload
%autoreload 2

In [6]:
import random
import torch
import numpy as np

from transformers import (
    AdamW,
    T5Tokenizer,
    T5AdapterModel,
    get_linear_schedule_with_warmup
)

device = 'cuda' if torch.cuda.is_available() else 'cpu'

## Custom Imports

In [7]:
# Custom Imports
from dataprovider.DataProvider import DatasetProvider
from modelT5Adapter import MyT5AdapterModel

## Set Random Seed

In [8]:
def set_seed(seed):
  random.seed(seed)
  np.random.seed(seed)
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(42)

# Load Data

## Load Training Data

In [9]:
# Load data
train_dataset = DatasetProvider('t5-small', 'data/COMBINED', 'train')

data/COMBINED/train.tsv


  f"This sequence already has {self.eos_token}. In future versions this behavior may lead to duplicated eos tokens being added."


## Load Test Data

In [10]:
# Load validation data
test_dataset = DatasetProvider("t5-base", "data/COMBINED", "test")

data/COMBINED/test.tsv


  f"This sequence already has {self.eos_token}. In future versions this behavior may lead to duplicated eos tokens being added."


In [11]:
# Show example of data
train_dataset[0]

{'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
 'decoder_attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      

In [12]:
# Combine into dictionary
dataset = {
    "train": train_dataset,
    "test": test_dataset
}

# Model and Tokenizer

## Load Model

In [13]:
# Load model
model = T5AdapterModel.from_pretrained("t5-small")

Some weights of the model checkpoint at t5-small were not used when initializing T5AdapterModel: ['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- This IS expected if you are initializing T5AdapterModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5AdapterModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of T5AdapterModel were not initialized from the model checkpoint at t5-small and are newly initialized: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Add Adapter Layers

In [14]:
# Add Adapter layers
model.add_adapter("paraphrase")

# Freeze other layers, and activate adapter layers for training
model.train_adapter("paraphrase")

In [15]:
# See active adapters
model.active_adapters

Stack[paraphrase]

## Load Tokenizer

In [16]:
# Load tokenizer
tokenizer = T5Tokenizer.from_pretrained("t5-small")

# Train

## Setup Training

In [17]:
from transformers import TrainingArguments, AdapterTrainer, EvalPrediction

training_args = TrainingArguments(
    output_dir="./examples", 
    do_train=True,
    remove_unused_columns=False,
    learning_rate=5e-4,
    num_train_epochs=3,
)


trainer = AdapterTrainer(
      model=model,
      args=training_args,
      tokenizer=tokenizer,
      train_dataset=dataset["train"],
      eval_dataset=dataset["test"], 
  )



## Run training

In [18]:
trainer.train()

***** Running training *****
  Num examples = 58618
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 21984


KeyError: ignored

# Save

In [None]:
model.save_adapter("adapter_paraphrase", "paraphrase")