Introduction
This article will discuss a very amazing package that lets you perform all of your NLP Operations in just a couple of lines of code. Even if you are a beginner, you can still perform various NLP operations and fine-tune them according to your task.

Applying NLP operations from scratch for inference becomes tedious since it requires various steps to be performed.

1. process our raw text data using tokenizer

2. Convert the data into the model’s input format

3. Design the model using pre-trained layers or custom layers

4. Training and validation

5. Inference

Here transformer’s package cut these hassle. Transformers package basically helps us to implement NLP tasks by providing pre-trained models and simple implementation.

---

<H1> TRANSFORMERS PACKAGE

The transformer package is provided by huggingface.io. It tries to solve the various challenges we face in the NLP field; it provides pre-trained models, tokenizers, configs, various APIs, ready-made pipelines for our inference etc. The transformers package gives us the advantage of using pre-trained language models along with their data-processing tools. Most of the models are provided to us directly and made available in the library in PyTorch and TensorFlow.

Transformers package requires TensorFlow or PyTorch to work, and it can train models in just some lines of code and pre-process our text data easily.

The Transformers library comes with more than 30 pre-trained models and supports up to 100 languages, along with 8 major architectures for natural language understanding (NLU) and natural language generation (NLG):

BERT (from Google);
GPT-2 (from OpenAI);
GPT (from OpenAI);
Transformer-XL (from Google/CMU);
XLNet (from Google/CMU);
RoBERTa (from Facebook);
XLM (from Facebook);
DistilBERT (from HuggingFace).

---

<H1> Getting Started with Pipeline

The easiest way to use a pre-trained model for prediction for a given NLP task is to use pipeline() from the Transformers package. Pipelines are a Great Way to use a pre-trained model for our Inference. These Pipelines are abstract of most of the complex code written for data pre-processing and inferential steps. Creating a Pipeline for NLP tasks is very easy with Transformers.

Getting Pipelines from Package:

First, we need to install the Transformers package and then import the pipeline class from it

In [None]:
!pip install transformers
from transformers import pipeline

In the Transformers package, the pipeline It is a wrapper class of other pipelines for Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction, Question Answering, etc.

**How to load a Pipeline for a specific Task:**

Transformers pipeline also works with the custom models; you can call that in the pipeline if you have a trained model. the different model needs different tokenizers.

pipe_task = pipeline(‘task_name’,model =’model_name’, tokenizer )

If you don’t define the model name and tokenizer, it will load the default model and default tokenizer.

---

<H1> Sentiment Analysis

This pipeline can classify a text based on sentimentality with positive and negative along with confidence.

This pipeline is trained only for binary-class classification, so that it won’t work for multi-class classification.

Calling the pipeline

In [4]:
pipe = pipeline("text-classification")
pipe = pipeline("sentiment-analysis")
pipe(["This restaurant is awesome", "This restaurant is aweful"])

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9998743534088135},
 {'label': 'POSITIVE', 'score': 0.9996858835220337}]

---

# Loading Pipeline with a given Model

if you have trained a model and want to use it in the pipeline you only need to name the model in the pipeline.

In [5]:
pipe = pipeline("text-classification", model="roberta-large-mnli")

Downloading:   0%|          | 0.00/688 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [6]:
pipe(["This restaurant is awesome", "This restaurant is aweful"])

[{'label': 'NEUTRAL', 'score': 0.7313133478164673},
 {'label': 'NEUTRAL', 'score': 0.8322249054908752}]

In [7]:
pip install -q transformers

In [8]:
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["This restaurant is awesome", "This restaurant is bad"]
sentiment_pipeline(data)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9998743534088135},
 {'label': 'NEGATIVE', 'score': 0.9998098015785217}]

---

<H1> TEXT CLASSIFICATION

Summarise a given text by creating new sentences.

It uses generative text summarization based on deep learning. Currently, T5 and BART only support text summarization pipelines.

In [9]:
context  = r"""
The Mars Orbiter Mission (MOM), also called Mangalyaan ("Mars-craft", from Mangala, "Mars" and yāna, "craft, vehicle")
is a space probe orbiting Mars since 24 September 2014?
It was launched on 5 November 2013 by the Indian Space Research Organisation (ISRO).
It is India's first interplanetary mission and it made India the fourth country to achieve Mars orbit, 
after Roscosmos, NASA, and the European Space Company.
and it made India the first country to achieve this in the first attempt.
The Mars Orbiter took off from the First Launch Pad at Satish Dhawan Space Centre
(Sriharikota Range SHAR), Andhra Pradesh, using a Polar Satellite Launch Vehicle (PSLV) rocket C25
at 09:08 UTC on 5 November 2013.
The launch window was approximately 20 days long and started on 28 October 2013.
The MOM probe spent about 36 days in  Earth orbit, where it made a series of seven apogee-raising 
orbital maneuvers before trans-Mars injection
on 30 November 2013 (UTC).[23] After a 298-day long journey to Mars orbit, 
it was put into Mars orbit on 24 September 2014."""

In [10]:
summarizer = pipeline("summarization", model = "t5-base", tokenizer = "t5-base", framework = "tf")
summary = summarizer(context, max_length = 130, min_length = 60)
print(summary)

Downloading:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/851M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.32M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


[{'summary_text': "the mars Orbiter Mission (MOM) is a space probe orbiting Mars since 24 September 2014 . it is India's first interplanetary mission and it made India the fourth country to achieve Mars orbit . the probe spent about 36 days in Earth orbit before trans-Mars injection on 30 November 2013 ."}]


max length of the summarymax_length = 120 and min_length = 60 and model = ‘t5_base’

In [12]:
#from transformers import pipeline

classifier = pipeline("summarization")
classifier("Paris is the capital and most populous city of France,\
             with an estimated population of 2,175,601 residents as of 2018,\
              in an area of more than 105 square kilometres (41 square miles).\
               The City of Paris is the centre and seat of government of the region\
                and province of Île-de-France, or Paris Region, which has an estimated\
                 population of 12,174,880, or about 18 percent of the population of France as of 2017.")

# OUTPUT
''' [{'summary_text': ' Paris is the capital and most populous city of France,
       with an estimated population of 2,175,601 residents as of 2018 .
        The city is the centre and seat of government of the region and 
        province of Île-de-France, or Paris Region . Paris Region has an 
        estimated 18 percent of the population of France as of 2017 .'}] '''

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

" [{'summary_text': ' Paris is the capital and most populous city of France,\n       with an estimated population of 2,175,601 residents as of 2018 .\n        The city is the centre and seat of government of the region and \n        province of Île-de-France, or Paris Region . Paris Region has an \n        estimated 18 percent of the population of France as of 2017 .'}] "

---

<H1> QUESTION & ANSWERING

In Question Answering, a context is provided, and the model tries to find the answer hidden in the passage.

In [13]:
context  = r"""
The Mars Orbiter Mission (MOM), also called Mangalyaan ("Mars-craft", from Mangala, "Mars" and yāna, "craft, vehicle")
is a space probe orbiting Mars since 24 September 2014?
It was launched on 5 November 2013 by the Indian Space Research Organisation (ISRO).
It is India's first interplanetary mission and it made India the fourth country to achieve Mars orbit, 
after Roscosmos, NASA, and the European Space Company.
and it made India the first country to achieve this in the first attempt.
The Mars Orbiter took off from the First Launch Pad at Satish Dhawan Space Centre
(Sriharikota Range SHAR), Andhra Pradesh, using a Polar Satellite Launch Vehicle (PSLV) rocket C25
at 09:08 UTC on 5 November 2013.
The launch window was approximately 20 days long and started on 28 October 2013.
The MOM probe spent about 36 days in  Earth orbit, where it made a series of seven apogee-raising 
orbital maneuvers before trans-Mars injection
on 30 November 2013 (UTC).[23] After a 298-day long journey to Mars orbit, 
it was put into Mars orbit on 24 September 2014."""

Calling pipeline:

In [14]:
nlp = pipeline("question-answering")
result = nlp(question = "When did Mars Mission Launched?", context = context)
print(result['answer'])

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

5 November 2013


The T5  default model is being used for the inference.

---

<h1> LANGUAGE TRANSLATION

Transformer pipeline can translate from one language to another target language you want.

Available languages for [LANGUAGE TRANSLATOR MODEL](https://huggingface.co/models?pipeline_tag=translation):

<H1> AUTO CLASSES:

Transformers package provides us an Auto-Class like AutoTokenizer, AutoModel,AutoConfig AutoModelForSeq2Seq etc.

These classes contain models, tokenizers, and configs, which can be imported by calling their trivial names.

In [18]:
!pip install sentencepiece

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [1]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

**ENGLISH TO RUSSION LANGUAGE TRANSLATION**

In [2]:
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-ru")
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-ru")

Downloading:   0%|          | 0.00/784k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.03M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.48M [00:00<?, ?B/s]



Downloading:   0%|          | 0.00/293M [00:00<?, ?B/s]

In [3]:
translator = pipeline("translation", model = model, tokenizer = tokenizer)

In [4]:
translated = translator('this is me and my name.you  know me very well')[0].get('translation_text')
print(translated)

Это я и мое имя. Ты меня очень хорошо знаешь.


In [6]:
tokenizer_eng2hin = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-hi")
model_eng2hin = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-hi")

Downloading:   0%|          | 0.00/2.00M [00:00<?, ?B/s]



Downloading:   0%|          | 0.00/292M [00:00<?, ?B/s]

In [7]:
translator = pipeline("translation", model = model_eng2hin, tokenizer = tokenizer_eng2hin)

In [8]:
translated = translator('this is me and my name.you  know me very well')[0].get('translation_text')
print(translated)

यह मैं और मेरा नाम है. आप मुझे बहुत अच्छी तरह से जानते हैं.
