## **INITIALIZATION:**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [1]:
#@ INITIALIZATION: 
%reload_ext autoreload
%autoreload 2
%matplotlib inline

## **LIBRARIES AND DEPENDENCIES:**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.
- Make sure to install `transformers[sentencepiece]` in order to use `pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")`, e.g. language translation task

In [2]:
#@ INSTALLING DEPENDENCIES: UNCOMMENT BELOW: 
!pip install transformers[sentencepiece]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
#@ DOWNLOADING LIBRARIES AND DEPENDENCIES:
import transformers
from transformers import pipeline

## **Natural Language Processing**

- **NLP** is a field of linguistics and machine learning focused on understanding everything related to human language. The aim of **NLP** tasks is not only to understand single words individually, but to be able to understand the context of those words. 

### **SENTIMENT ANALYSIS:**

The line of code `classifier = pipeline("sentiment-analysis")` creates a sentiment analysis classifier using the Hugging Face Transformers library.

The `pipeline()` function is a convenience method provided by the Transformers library that allows developers to quickly instantiate pre-trained models for various natural language processing (NLP) tasks, including sentiment analysis.

In this case, the `pipeline()` function is used to create a `sentiment-analysis` pipeline, which is a pre-configured pipeline that includes a pre-trained transformer model for sentiment analysis.

The resulting classifier object is an instance of the pipeline class, which has a convenient method called `__call__()` that allows you to use the pre-trained sentiment analysis model to analyze the sentiment of input text.

Overall, this line of code provides a quick and easy way to create a pre-trained sentiment analysis model for use in NLP applications.

### **WHAT MODEL**

The specific model being used by the sentiment analysis pipeline created by the code `classifier = pipeline("sentiment-analysis")` depends on the default model specified by the Hugging Face Transformers library at the time the code is executed.

However, by default, the sentiment analysis pipeline uses the pre-trained DistilBERT model, which is a smaller, faster, and lighter version of the popular BERT (Bidirectional Encoder Representations from Transformers) model. The DistilBERT model was introduced in the paper "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" by Sanh et al. in 2020.

The DistilBERT model was trained on a masked language modeling objective, which involves predicting masked words in text, and fine-tuned on a sentiment analysis task using the IMDb dataset, which contains movie reviews labeled as positive or negative.

While the specific implementation of the sentiment analysis pipeline created by the code `classifier = pipeline("sentiment-analysis")` may not have a dedicated research paper, the underlying pre-trained DistilBERT model has been extensively evaluated and published in the aforementioned paper. Additionally, the Hugging Face Transformers library provides extensive documentation and examples of how to use the model and pipeline for various NLP tasks.





In [4]:
#@ IMPLEMENTATION OF SENTIMENT ANALYSIS PIPELINE:
classifier = pipeline("sentiment-analysis")                                 # Initializing Classifier Object. 
classifier("I've started the HuggingFace course which fascinates me.")      # Inspecting Sentiment. 

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9997233748435974}]

In [5]:
#@ IMPLEMENTATION OF SENTIMENT ANALYSIS PIPELINE: MULTIPLE:
classifier(
    [
        "I've started the HuggingFace course which fascinates me.",
        "I will no longer read it's documentation.",
        "I think the course is awesome!"
    ]
) # Inspecting Sentiment. 

[{'label': 'POSITIVE', 'score': 0.9997233748435974},
 {'label': 'NEGATIVE', 'score': 0.9994686245918274},
 {'label': 'POSITIVE', 'score': 0.9998465776443481}]

#### **Pipelines**

- The three main steps involved when we pass some text to a `pipeline` are: 
    - The text is preprocessed into a format the model can understand.
    - The preprocessed inputs are passed to the model. 
    - The predictions of the model are post-processed, so we can make sense of them. 

### **ZERO-SHOT CLASSIFICATION:**

- The `zero-shot-classification` pipeline is very powerful, as it allows us to specify which labels to use for the classification, so we don't have to rely on the labels of the pretrained model. This `pipeline` is called `zero-shot` because we don't need to fine-tune the model on our data to use it. It can directly return probability scores for any list of labels we want. 

The zero-shot-classification pipeline is a pre-configured pipeline that uses a pre-trained transformer model for classification tasks. Unlike traditional classification pipelines, the zero-shot classification pipeline can classify input text based on classes that are not explicitly defined in the training data.

The pipeline works by encoding the input text and a list of candidate labels using a pre-trained transformer model. The candidate labels represent the classes that the input text may belong to, but they are not used to train the model. Instead, the model predicts the likelihood of the input text belonging to each of the candidate labels by comparing the encoded input text with the encoded candidate labels.



In [6]:
#@ IMPLEMENTATION OF ZERO SHOT CLASSIFICATION PIPELINE: 
classifier = pipeline("zero-shot-classification")                           # Initializing Classifier Object. 
classifier("This is a course about Transformers library", 
           candidate_labels=["education", "health", "programming"])         # Inspecting Classification. 

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about Transformers library',
 'labels': ['education', 'programming', 'health'],
 'scores': [0.6901501417160034, 0.1908414661884308, 0.1190083846449852]}

### **TEXT GENERATION:**

- The main idea here is that when we provide a prompt and the model will auto-complete it by generating the remaining text. 

In [7]:
#@ IMPLEMENTATION OF TEXT GENERATION PIPELINE: 
generator = pipeline("text-generation")                                 # Initializing Generator Object. 
generator("In this course, you will learn to")                          # Inspecting Generated Text. 

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, you will learn to take on each new project with real-world use cases, making it easier to follow through with your concepts. Students will go through what you need to work with in detail before you start using your new project.'}]

In [8]:
#@ IMPLEMENTATION OF TEXT GENERATION PIPELINE: 
generator("I like python because", num_return_sequences=2, max_length=15)     # Inspecting Generated Sequences. 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'I like python because it\'s open source," said Cipriano.'},
 {'generated_text': 'I like python because of the interface. For that I found myself getting errors'}]

In [9]:
#@ IMPLEMENTATION OF TEXT GENERATION PIPELINE: DISTILGPT2:
generator = pipeline("text-generation", model="distilgpt2")                   # Initializing Generator Object. 
generator("I want to be a programmer so that", 
          num_return_sequences=2, max_length=30)                              # Inspecting Generated Sequences. 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'I want to be a programmer so that this will be used. It\u202cs a real hard work to deal with. That\u202cs a'},
 {'generated_text': 'I want to be a programmer so that you can do anything you want, but with such a great deal of resources and time, I would love to'}]

### **MASK FILLING:**

- The idea of this task is to fill in the blanks in a given text. 

In [10]:
#@ IMPLEMENTATION OF MASK FILLING PIPELINE:
unmasker = pipeline("fill-mask")                                        # Initilizing Mask Filling Object. 
unmasker("This course will teach you all about <mask> models.", 
         top_k=2)                                                       # Inspecting Mask Token.

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.19619806110858917,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052723944187164,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

In [11]:
#@ IMPLEMENTATION OF MASK FILLING PIPELINE:
unmasker = pipeline("fill-mask", model="bert-base-cased")               # Initilizing Mask Filling Object. 
unmasker("This course will teach you all about [MASK] models.", 
         top_k=2)                                                       # Inspecting Mask Token.

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.25963184237480164,
  'token': 1648,
  'token_str': 'role',
  'sequence': 'This course will teach you all about role models.'},
 {'score': 0.09427252411842346,
  'token': 1103,
  'token_str': 'the',
  'sequence': 'This course will teach you all about the models.'}]

### **NAMED ENTITY RECOGNITION:**

- **NER** is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations. 

In [12]:
#@ IMPLEMENTATION OF NER PIPELINE:
ner = pipeline("ner", grouped_entities=True)                           # Initializing NER Object. 
ner("I am Thinam Tamang and I am from Kathmandu, Nepal.")              # Inspecting Entities. 

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'entity_group': 'PER',
  'score': 0.9967239,
  'word': 'Thinam Tamang',
  'start': 5,
  'end': 18},
 {'entity_group': 'LOC',
  'score': 0.9990222,
  'word': 'Kathmandu',
  'start': 33,
  'end': 42},
 {'entity_group': 'LOC',
  'score': 0.9997008,
  'word': 'Nepal',
  'start': 44,
  'end': 49}]

###**QUESTION ANSWERING:**

- The `question-answering` pipeline answers questions using information from a given context. 

In [13]:
#@ IMPLEMENTATION OF QUESTION ANSWERING PIPELINE: 
question_answerer = pipeline("question-answering")                          # Initialization. 
question_answerer(
    question="What is my name?",
    context="I am Thinam from Nepal.")                                      # Inspecting Answer. 

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.8733125925064087, 'start': 5, 'end': 11, 'answer': 'Thinam'}

### **SUMMARIZATION:**

- Summarization is the task of reducing a text into a shorter text while keeping all or most of the important aspects referenced in the text. 

In [14]:
#@ IMPLEMENTATION OF SUMMARIZATION PIPELINE:
summarizer = pipeline("summarization")                                                  # Initializing Summarizer Object. 
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)                                                                                         # Inspecting Summarized Text. 

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

### **TRANSLATION:**

In [15]:
#@ IMPLEMENTATION OF TRANSLATION PIPELINE: 
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")        # Initializing Translator Object. 
translator("Ce cours est produit par Hugging Face.")                            # Inspecting Translation. 

Downloading (…)olve/main/source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading (…)olve/main/target.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



[{'translation_text': 'This course is produced by Hugging Face.'}]