In [None]:
!pip install -q bitsandbytes datasets accelerate loralib transformers peft

In [None]:
import os
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

Loading the base model. Facebook's OPT 1.3b model will be loaded as the base model here. This model is loaded in 8-bit mode. 
Setting device_map to 'auto' will automatically allocate different parts of the model into all available devices.
This might raise and error in Kaggle notebooks so if you're using multiple GPUs on Kaggle and want to use them all for fine-tuning, you might have to specify the device map manually yourself.

Also, the tokenizer pipeline for the base model is loaded for tokenizing, preprocessing, and postprocessing tasks.

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    "facebook/opt-1.3b", 
    load_in_8bit=True, 
    device_map='auto',
)

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-1.3b")

Loading the dataset and using a map function to tokenize and preprocess each data item in the dataset.

Notice the max_length parameter given while calling the tokenizer function. Without doing this, allocated GPU RAM was exceeded by the model during the 94th step (when the per device batch size equals to 8). This kept happening in the same step, no matter how much you change all other parameters like batch size, gradient accumulation steps so the assumption is if the inputs are not restricted by a maximum number of tokenizers, it might use a big number resulting in large size input tensors.

In [None]:
import transformers
from datasets import load_dataset
data = load_dataset("databricks/databricks-dolly-15k")
data = data.map(lambda samples: tokenizer(samples['instruction'], max_length=1024, truncation=True), batched=True)

Freeze the original model weights to prevent them from being altered in the fine-tuning process. The original weights will not be altered in the fine-tuning process; an adapter will be trained in the fine-tuning process instead.

In [None]:
for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

### Applying LoRA

Configuring LoRA to train the model. Feel free to try different combinations for the value of r, which will change the number of parameters in the adapter. These configs will later be exported into an JSON file along with the adapter model binary file.

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

from peft import LoraConfig, get_peft_model 

config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

### Defining training arguments

Configuring model training arguments for the LoRA model - feel free to experiment with different train batch sizes and gradient accumulation steps (depending on available resources).

In [None]:
trainer = transformers.Trainer(
    model=model, 
    train_dataset=data['train'],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4, 
        gradient_accumulation_steps=4,
        warmup_steps=100, 
        num_train_epochs=2, # Either use this or max_steps, depending on the epochs or max_steps you want to specify
        # max_steps=1000, 
        learning_rate=2e-4, 
        fp16=True,
        logging_steps=10, 
        output_dir='outputs'
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False 

### Training

If you are running this on Google Colab, this might be a good time to say a prayer. Colab has an annoying habit of disconnecting out of the blue while a process is still running. Also, don't forget to check the resource usage to see how much GPU RAM this training process is taking at any given time.

In [None]:
trainer.train(resume_from_checkpoint = False)

Saving the model. This will create a directory which will include the configs JSON file and the adapter model as a binary file.

In [None]:
model.save_pretrained("lora-muwa-1.3b-opt")