# Hugging Face Transformers

## Text generation

Hugging Face pretrained models:

* GPT2LMHeadModel is used for text generation

* GPT2Tokenizer
  * converts text to tokens
  * handles subword tokenization

In [1]:
import torch

In [2]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

In [3]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

In [4]:
sample_text = 'The cat sat on the mat'

In [5]:
# flag return_tensors equals 'pt' specifies that we want these tensors in PyTorch format
input_ids = tokenizer.encode(sample_text, return_tensors='pt')

Arguments:

* temperature - controls the randomness of the output, with lower values reducing randomness
* no_repeat_ngram_size parameter - prevents consecutive word repetition in the generated text
* pad_token_id is set to the ID of the end-of-sentence (EOS) token, which means the model pads the output with this token if it's shorter than the maximum length of 40 tokens.

In [6]:
output = model.generate(
    input_ids, 
    max_length=40, 
    temperature=0.7, 
    no_repeat_ngram_size=2,
    pad_token_id=tokenizer.eos_token_id,
    do_sample=True
)

In [7]:
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
generated_text

"The cat sat on the mat, its legs pulled under the covers and its paws folded back on its knees and held his paws. The cat's eyes went wide, the cat was still breathing heavily,"

## Translation

`t5-small` is Text-to-Text trasformer model. It supports English, French, Romanian, German.

In [8]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

In [None]:
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')

In [10]:
sample_text = 'translate English to French: I love to read books'

In [11]:
input_ids = tokenizer.encode(sample_text, return_tensors='pt')

In [12]:
output = model.generate(input_ids, max_length=100)

In [13]:
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
generated_text

'Je lis des livres'

## Evaluating text generation with BLUE and ROUGE

BLEU and ROUGE compare generated text to reference texts and evaluate its quality more closely with how humans perceive language.

**BLEU (Bilingual Evaluation Understudy)** compares the generated text with a reference text by examining the occurrence of n-grams. 

In a sentence like 'the cat is on the mat', the 1-grams or uni-grams are each individual word, the 2-grams or bi-grams are 'the cat', 'cat is', and so on. The more the generated n-grams match the reference n-grams, the higher the BLEU score. A perfect match results in a score of 1-point-0, while zero would mean no match.

In [18]:
from torchmetrics.text import BLEUScore

In [32]:
generated_text = ['the cat is on the mat']

In [37]:
real_text = [['a cat is on the mat', 'there is a cat on mat']]

In [38]:
blue = BLEUScore()

In [39]:
blue_score = blue(generated_text, real_text)

In [40]:
blue_score.item()

0.7598357200622559

**ROUGE (Recall-Oriented Understudy for Gisting Evaluation)** assesses generated text against reference text in two ways: 

* examines overlapping n-grams, with N representing the n-gram order
* checks for the longest common subsequence (LCS), the longest shared word sequence between the generated and reference text

ROUGE has three metrics:

* F-measure is the harmonic mean of precision and recall. 

* Precision checks matches of n-grams in the generated text that are in the reference text (how many selected items are relevant). 

* Recall checks for matches of n-grams in the reference text that appear in the generated text (how many selected items are relevant). 

The prefixes 'rouge1', 'rouge2', and 'rougeL' specify the n-gram order or LCS.

In [41]:
from torchmetrics.text import ROUGEScore

In [42]:
rouge = ROUGEScore()

In [43]:
rouge_score = rouge(generated_text, real_text)

In [45]:
rouge_score

{'rouge1_fmeasure': tensor(0.8333),
 'rouge1_precision': tensor(0.8333),
 'rouge1_recall': tensor(0.8333),
 'rouge2_fmeasure': tensor(0.8000),
 'rouge2_precision': tensor(0.8000),
 'rouge2_recall': tensor(0.8000),
 'rougeL_fmeasure': tensor(0.8333),
 'rougeL_precision': tensor(0.8333),
 'rougeL_recall': tensor(0.8333),
 'rougeLsum_fmeasure': tensor(0.8333),
 'rougeLsum_precision': tensor(0.8333),
 'rougeLsum_recall': tensor(0.8333)}