In [1]:
from transformers import pipeline

The [pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering.

It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer

## Sentiment Analysis

Sentiment Analysis is the task of classifying the input text in the defined categories of sentiment. This is one of the most common NLP task commonly used for identifying user sentiments based on reviews. The common sentiment classes as *positive*, *negative*, and *neutral*.

Here in the following examples, we are using sentiment analysis to identify sentiments for the news headlines.

In [2]:
data = ["LTIMindtree Q2FY24: Show of strength. Good revenue growth and resilient margin performance",
        "The company expects furloughs to be more pronounced in Q3 and it is guiding to a very weak quarter, with revenue decline between 1.5 percent and 3.5 percent",
        "Arkam Ventures is also an investor in Jai Kisan, one of India’s fastest-growing rural fintech platforms for farmers and retailers, and Jumbotail, India’s leading B2B food and grocery marketplace and retail platform",
        
       ]

In [3]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


#### What happens?

The default model for sentiment analysis is downloaded (if no specific model is mentioned) into cache. This followed by 3 steps that are abstracted from the end-users:

* Pre-processing of the input in required tensor.
* Output generation using the task and model.
* Post-processing the output for the end-users.

In [4]:
classifier(data)

[{'label': 'POSITIVE', 'score': 0.9995540976524353},
 {'label': 'NEGATIVE', 'score': 0.9995874762535095},
 {'label': 'POSITIVE', 'score': 0.9979150891304016}]

## Text-generation Pipeline

In [this pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline), we will provide a prompt, and the model will try to auto-complete the prompt by generating text in continuation. 

In the previous pipeline, we relied on the pipeline to use the default model for the task of sentiment analysis. Here, we are specifying the model to use **distilgpt2**. Also, we are specifying the task by appropriate parameter. 

In [5]:
generator = pipeline(task='text-generation', model='distilgpt2')

In [6]:
generator("LTIMindtree Q2FY24: Show of strength. Good revenue growth and resilient margin performance. Thus indicating, investors can")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'LTIMindtree Q2FY24: Show of strength. Good revenue growth and resilient margin performance. Thus indicating, investors can start on as low as possible. -0.50% -00.50% -00.50% -00'}]

In [7]:
generator("LTIMindtree Q2FY24: Show of strength. Good revenue growth and resilient margin performance. Thus indicating, investors can",
         max_length=50,
         num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'LTIMindtree Q2FY24: Show of strength. Good revenue growth and resilient margin performance. Thus indicating, investors can leverage and generate some real value.\n\n\nIn the face of this, there is also a need to diversify'},
 {'generated_text': 'LTIMindtree Q2FY24: Show of strength. Good revenue growth and resilient margin performance. Thus indicating, investors can now expect to see positive results from investment.\n\n\n\nThe company is now in some form of bankruptcy as it'}]

The generated responses are always different because of the randomness introduced by probabilistic generation.

## Named Entity Recognition

This [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TokenClassificationPipeline) is used to identify the named entities (i.e. real-world objects) present in the input text. Some of the examples of NER classes are *Person*, *Organization*, and *Location*. NER is one of the important tasks of Information Extraction. The identified information is further used in relation classification. Together entities and relations are used in populating knowledge bases/graphs, which can be utilized by any downstream application.

In [8]:
ner = pipeline('ner', grouped_entities=True)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [9]:
ner(data[2])

[{'entity_group': 'ORG',
  'score': 0.99537367,
  'word': 'Arkam Ventures',
  'start': 0,
  'end': 14},
 {'entity_group': 'ORG',
  'score': 0.9757264,
  'word': 'Jai Kisan',
  'start': 38,
  'end': 47},
 {'entity_group': 'LOC',
  'score': 0.9988807,
  'word': 'India',
  'start': 56,
  'end': 61},
 {'entity_group': 'ORG',
  'score': 0.91945344,
  'word': 'Jumbotail',
  'start': 135,
  'end': 144},
 {'entity_group': 'LOC',
  'score': 0.9992254,
  'word': 'India',
  'start': 146,
  'end': 151}]

## Question Answering

In this [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.QuestionAnsweringPipeline) the model tries to generate answer for an input question from the provided context. This is more of an extractive question answering task.

In [10]:
qa = pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [11]:
qa(
    question="In which companies Akram Ventures have invested?",
    context=data[2]
)

{'score': 0.17462730407714844, 'start': 38, 'end': 47, 'answer': 'Jai Kisan'}

**NOTE:** This pipeline works by extracting information from the provided context; it does not generate the answer.

## Summarization

This [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.SummarizationPipeline) summarises the input text into smaller text keeping the important aspects (contexts) from the input text.

In [12]:
input_text = """Thus far, late stage start-ups, including the Unicorns, have faced the brunt of the funding winter. 
Early stage investment activity was relatively unaffected.  In fact, right through 2022, Angel, Seed and Series-A rounds were comparing quite favourably even to the dizzying heights of 2021.
But almost all good things need to come to an end. Nine months into 2023, the funding winter is biting hard in the early stage funding segment. And the latest quarter - ending September 2023 - has been especially harsh.
In the first nine months of 2023, Indian startups (overall) raised US$ 5.9 billion in funding. 
This his a 72 per cent decline compared to the US$21.3 billion they raised during the same period in 2022. 
Of this, Q3 CY223 saw 115 investments worth US$1.9 billion, marking a 61 decline from the 292 investments worth US$2.8 billion in the same period during 2022.
"""

In [13]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [14]:
summarizer(input_text, min_length=10, max_length=50)

[{'summary_text': ' In the first nine months of 2023, Indian startups (overall) raised US$ 5.9 billion in funding . This is a 72 per cent decline compared to the US$21.3 billion they raised during the same period'}]