<a href="https://colab.research.google.com/github/vyasakhilesh/NLP/blob/main/Transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Transformers

Credit
----
Author = Hugging Face

Title = The Hugging Face Course, 2022

howpublished = https://huggingface.co/course

year = 2022

note = Online accessed (today)

----

In [None]:
!pip install transformers
!pip install transformers[sentencepiece]

Collecting transformers
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m98.3 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m30.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m77.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m75.3 MB/s[0m eta [36m0:00:0

In [None]:
import transformers

[Available Pipelines](https://huggingface.co/transformers/main_classes/pipelines.html)

In [None]:
from transformers import pipeline

# model is cached once downloaded
classifier = pipeline(task="sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
classifier(
    ["This movie is great", "I hate this movie!"]
)

[{'label': 'POSITIVE', 'score': 0.9998806715011597},
 {'label': 'NEGATIVE', 'score': 0.9996733665466309}]

In [None]:
from transformers import pipeline

#This pipeline is called zero-shot because you don’t need to fine-tune the model on your data to use it.
# It can directly return probability scores for any list of labels you want!

classifier = pipeline("zero-shot-classification", model='facebook/bart-large-mnli')
classifier(
    "This notebook is about transformer learning",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'This notebook is about transformer learning',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.9610645771026611, 0.02773941680788994, 0.01119602657854557]}

In [None]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "How do you solve problem",
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'How do you solve problem after problem?\n\n\n\n\n\n\n\n\n\n'},
 {'generated_text': 'How do you solve problem? Will your solution solve the problem with a new idea and solution? If you see a problem you can solve that problem with'}]

In [None]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This notebook is having learning about <mask>", top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.014168459922075272,
  'token': 25776,
  'token_str': ' coding',
  'sequence': 'This notebook is having learning about coding'},
 {'score': 0.012149649672210217,
  'token': 10638,
  'token_str': ' math',
  'sequence': 'This notebook is having learning about math'}]

In [None]:
from transformers import pipeline

# Named entity recognition

ner = pipeline("ner", grouped_entities=True)
ner("Barack Obama is the president of United States.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'entity_group': 'PER',
  'score': 0.9992219,
  'word': 'Barack Obama',
  'start': 0,
  'end': 12},
 {'entity_group': 'LOC',
  'score': 0.9996251,
  'word': 'United States',
  'start': 33,
  'end': 46}]

In [None]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="When do you work?",
    context="I work in evening.",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.898702085018158, 'start': 10, 'end': 17, 'answer': 'evening'}

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    Some 117 rescues have been made in Vermont amid extreme flooding, public officials said on Tuesday, with 67 people evacuated from homes, businesses and vehicles and 17 animals rescued.

The city of Montpelier warned that a dam near the state capital is dangerously close to capacity. However, water levels appeared to stabilise by night.

“It looks like it won’t breach,” Montpelier town manager Bill Fraser said. “That is good. That is one less thing we have to have on our front burner.”

Officials had earlier warned that with “very few evacuation options remaining”, people in at-risk areas in the Montpelier area may wish to go to upper floors in their houses.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' 117 rescues have been made in Vermont amid extreme flooding . 67 people evacuated from homes, businesses and vehicles and 17 animals rescued . The city of Montpelier warned that a dam near the state capital is dangerously close to capacity . However, water levels appeared to stabilise by night .'}]

In [None]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce portable est basé sur le modèle de transformateur.")

[{'translation_text': 'This cell phone is based on the transformer model.'}]

This list is far from comprehensive, and is just meant to highlight a few of the different kinds of Transformer models. Broadly, they can be grouped into three categories:

* GPT-like (also called auto-regressive Transformer models)
* BERT-like (also called auto-encoding Transformer models)
* BART/T5-like (also called sequence-to-sequence Transformer models)



A task is predicting the next word in a sentence having read the n previous words. This is called causal language modeling because the output depends on the past and present inputs, but not the future ones.