Learning **NLP** (Natural Language Processing) using libraries from the **Hugging Face**

# 1. What is NLP?

**Definition**: NLP is a field of linguistics and machine learning focused on understanding everything related to human language.

Common NLP tasks:
1. Classifying whole sentences
    -Getting the sentiment of a review, 
    -detecting if an email is spam, 
    -determining if a sentence is grammatically correct or whether two sentences are logically related or not
2. Classifying each word in a sentence
    -Identifying the grammatical components of a sentence (noun, verb, adjective),
    -Identifying the named entities (person, location, organization)
3. Generating text content
    -Completing a prompt with auto-generated text, 
    -filling in the blanks in a text with masked words
4. Extracting an answer from a text
    -Given a question and a context, extracting the answer to the question based on the information provided in the    context
5. Generating a new sentence from an input text
    -Translating a text into another language, 
    -summarizing a text

# 2.Transformers

**Transformers** provide the functionality to create and use pretrained models for each of the discussed tasks above

**Pipeline**: The most basic object in the Hugging Face Transformers library is the pipeline() function. It *connects a model* with its *necessary preprocessing and postprocessing steps*, allowing us to *directly input any text and get an intelligible answer*.


There are three main steps involved when you pass some text to a pipeline:

1. The text is preprocessed into a format the model can understand.
2. The preprocessed inputs are passed to the model.
3. The predictions of the model are post-processed, so you can make sense of them.

**Pipeline for Sentiment analysis**

In [None]:
!pip install transformers -q

In [None]:
import transformers
from transformers import pipeline

In [None]:
classifier = pipeline("sentiment-analysis")
text_neg = ["I hate this so much!"]
text_pos = ["I've been waiting for a HuggingFace course my whole life."]
print(classifier(text_neg))
print(classifier(text_pos))

**Zero-shot classification**

It iis used for classifying texts that haven’t been labelled. 
This is a common scenario in real-world projects.
The **zero-shot-classification** pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model.

This pipeline is called zero-shot because you don’t need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!

Example:

In [None]:
classifierZ = pipeline("zero-shot-classification")
classifierZ("This is a course about the Transformers library", 
            candidate_labels = ["education", "politics", "business"])

**Text generation**

The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. 

Text generation involves randomness, so it’s normal if you don’t get the same results as shown below.

In [None]:
generator = pipeline("text-generation")
generator("In this course, we will teach you how to", num_return_sequences = 2, max_length = 15)

**Using pretrained model in a pipeline**

In [None]:
generatorM = pipeline("text-generation", model = "distilgpt2")
generatorM("In this course, we will teach you how to", 
           num_return_sequences = 2, 
           max_length = 15)

**Mask filling**

In [None]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

**Named entity recognition**

In [None]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

**Question answering**

In [None]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

**Summarization**

In [None]:
summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
""",
max_length = 150,
min_length = 50
)

**Translation**

In [None]:
translator = pipeline("translation_en_to_de")
translator("Hi, my name is Nazi.")