# Learning HuggingFace 

Hugging Face Transformers is a library offering numerous pre-trained models primarily for natural language processing tasks, with easy-to-use interfaces and extensive documentation.

Examples from https://www.youtube.com/watch?v=QEaBAZQCtwE

## imports

In [28]:
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSequenceClassification

## 1. load a model and do a simple sentiment classification

In [24]:
c = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [67]:
s = "getting started with hugging face in 5 minutes is a nightmare"
t = "getting started with hugging face with pytorch is awesome"
r = c(s)
r

[{'label': 'NEGATIVE', 'score': 0.9996582269668579}]

## 2. using generator with params and a specific model2. 

Generate two alternative examples of text.

In [26]:
generator = pipeline("text-generation", model="distilgpt2")
r = generator(s, max_length=30, num_return_sequences=2)
r

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'getting started with hugging face in 5 minutes is a nightmare.\n\n\nSo did this guy at the back of this post at all?! I have'},
 {'generated_text': 'getting started with hugging face in 5 minutes is a nightmare for some. It was also a very effective way to help those who needed a calm, self'}]

## 3. using generator with a zero shot classsifier

In [27]:
c = pipeline("zero-shot-classification")
r = c(s,
      candidate_labels=["education", "politics", "skateboarding"])
r

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'getting started with hugging face in 5 minutes is a nightmare',
 'labels': ['education', 'skateboarding', 'politics'],
 'scores': [0.5417985916137695, 0.23624803125858307, 0.22195343673229218]}

## 4. Digging into Pipelines


This section shows how to specify specific models and tokenizers

In [30]:
model_name = "distilbert-base-uncased-finetuned-sst-2-english"

In [31]:
model = AutoModelForSequenceClassification.from_pretrained(model_name)

In [32]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [36]:
c = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
c(s)

[{'label': 'NEGATIVE', 'score': 0.9996582269668579}]

print out the tokens

In [49]:
tokenizer(s)

{'input_ids': [101, 2893, 2318, 2007, 17662, 2227, 1999, 1019, 2781, 2003, 1037, 10103, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

In [51]:
tokens = tokenizer.tokenize(s)
tokens

['getting',
 'started',
 'with',
 'hugging',
 'face',
 'in',
 '5',
 'minutes',
 'is',
 'a',
 'nightmare']

print out the ids

In [52]:
ids = tokenizer.convert_tokens_to_ids(tokens)
ids

[2893, 2318, 2007, 17662, 2227, 1999, 1019, 2781, 2003, 1037, 10103]

In [53]:
decoded_string = tokenizer.decode(ids)
decoded_string

'getting started with hugging face in 5 minutes is a nightmare'

## 5.  Using Transformers with pyTorch

In [55]:
import torch
import torch.nn.functional as F

In [59]:
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [60]:
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

In [68]:
X_train = [s, t]

In [69]:
r = classifier(X_train)
r

[{'label': 'NEGATIVE', 'score': 0.9996582269668579},
 {'label': 'POSITIVE', 'score': 0.9998277425765991}]

In [70]:
batch = tokenizer(X_train,
                 padding=True,
                 truncation=True,
                 max_length=512,
                 return_tensors="pt")

In [72]:
with torch.no_grad():
    outputs = model(**batch)
    predictions = F.softmax(outputs.logits, dim=1)
    labels = torch.argmax(predictions, dim=1)
(predictions, labels)

(tensor([[9.9966e-01, 3.4176e-04],
         [1.7227e-04, 9.9983e-01]]),
 tensor([0, 1]))

## 6. Working locally by saving models

In [73]:
d = './hf_models'
tokenizer.save_pretrained(d)
model.save_pretrained(d)

In [74]:
local_tokenizer = AutoTokenizer.from_pretrained(d)
local_model = AutoModelForSequenceClassification.from_pretrained(d)

## 7. Using Models from ModelHub 

Try usnig a model from the hub e.g https://huggingface.co/facebook/bart-large-cnn

In [76]:
model_name = "facebook/bart-large-cnn"

In [80]:
summarizer = pipeline("summarization", model=model_name)

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [79]:
text = "Nijah Houston is an incredibly talented skateboarder, admired for his remarkable skills and dedication. His awe-inspiring performances consistently captivate audiences, showcasing his passion and commitment to the sport. Nijah's positive attitude and resilience make him a true inspiration and a role model in the skateboarding community."
text

"Nijah Houston is an incredibly talented skateboarder, admired for his remarkable skills and dedication. His awe-inspiring performances consistently captivate audiences, showcasing his passion and commitment to the sport. Nijah's positive attitude and resilience make him a true inspiration and a role model in the skateboarding community."

In [84]:
summarizer(text, max_leongth=32, min_length=16)

[{'summary_text': 'Nijah Houston is an incredibly talented skateboarder, admired for his remarkable skills and dedication. His positive attitude and resilience make him a true inspiration'}]