# Transfer learning with BERT

<img src="./img/transfer_learning.png" alt="transfer_learning" style="width: 600px;"/>

Transfer learning is the reuse of a pre-trained model on a new problem.

Transfer learning is a general paradigm about leveraging knowledge gained in one domain to improve performance in another related domain. 

It could be achieved not only by fine-tuning LLM but with:

* Zero-shot learning - model cannot look at examples from traget class during training
* One-shot learning - exposing a model to one or
* k-Shot learing - a few specific examples only is known as one-shot and few-shot training.


Below, an example of transfer learning with LLM.

**BERT**(Bidirectional Encoder Representations from Transformers) is trained for language modeling. 

It contains multiple layers of transformers and is pre-trained on large texts. We will tune it to work with sentiment analysis. 

In [1]:
import torch
from transformers import BertTokenizer, BertForSequenceClassification
from datasets import load_dataset

In [2]:
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [11]:
sample_texts = [
    'I love this!',
    'This is terrible.',
    'Amazing experience!',
    'Not my cup of tea.'
]

labels = [1, 0, 1, 0]

In [49]:
inputs = tokenizer(sample_texts, padding=True, truncation=True, return_tensors='pt', max_length=32)

In [50]:
inputs['labels'] = torch.tensor(labels)

In [51]:
optimizer = torch.optim.AdamW(model.parameters(), lr=0.00001)

In [52]:
_ = model.train()

In [57]:
# training
for epoch in range(2):
    outputs = model(**inputs)
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    print(f"Epoch: {epoch+1}, Loss: {loss.item()}")

Epoch: 1, Loss: 0.44676530361175537
Epoch: 2, Loss: 0.4095667004585266


In [60]:
# evaluating
text = text = "I had an nice day!"
input_eval = tokenizer(text, padding=True, truncation=True, return_tensors='pt', max_length=32)
outputs_eval = model(**input_eval)
predictions = torch.nn.functional.softmax(outputs_eval.logits, dim=-1)
predicted_label = 1 if torch.argmax(predictions) > 0 else 0
predicted_label

1

# Fine-tunning pretrained BERT

Fine-tuning involves taking a pre-trained model and re-training it to solve a particular downstream task with domain-specific data. The goal is to improve its performance for that task.

In [None]:
data = load_dataset('imdb')

def tokenize_func(sample_text):
    return tokenizer(sample_text["text"], padding="max_length", truncation=True)
    
tokenized_data = data.map(tokenize_func, batched=True)

In [14]:
from transformers import Trainer, TrainingArguments

In [None]:
training_args = TrainingArguments(
    output_dir='./finetunned_model',
    use_cpu=True,
    per_device_train_batch_size=8,
    num_train_epochs=5,
    evaluation_strategy="steps",
    eval_steps=500,
    save_steps=500,
    logging_dir="./logs",
)
trainer =  Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
)

In [None]:
trainer.train()

In [None]:
tokenized_sample = tokenizer('Probably my all-time favorite movie, a story of selflessness, sacrifice and dedication')

In [None]:
output = model(**tokenized_sample)
predicted_label = torch.argmax(output.logits, dim=1).item()

In [None]:
# save model and tokenizer
model.save_pretrained('./finetunned_model')
tokenizer.save_pretrained('./finetunned_model')