# Sentiment Analysis Using the Pipeline

In [1]:
from transformers import pipeline

In [2]:
data = ["LTIMindtree Q2FY24: Show of strength. Good revenue growth and resilient margin performance",
        "The company expects furloughs to be more pronounced in Q3 and it is guiding to a very weak quarter, with revenue decline between 1.5 percent and 3.5 percent",
        "Arkam Ventures is also an investor in Jai Kisan, one of India’s fastest-growing rural fintech platforms for farmers and retailers, and Jumbotail, India’s leading B2B food and grocery marketplace and retail platform",
       ]

In [3]:
model_name = "distilbert-base-uncased-finetuned-sst-2-english"

In [4]:
classifier = pipeline('sentiment-analysis', model=model_name)

In [5]:
classifier(data)

[{'label': 'POSITIVE', 'score': 0.9995540976524353},
 {'label': 'NEGATIVE', 'score': 0.9995874762535095},
 {'label': 'POSITIVE', 'score': 0.9979150891304016}]

The pipeline groups together:
1. **Input Pre-processing:** Takes in raw text and converts it into vectors of input IDs using **Tokenizer**.
2. **Generating output from model:** Takes in the input IDs, converts them into embedding vectors finally generates output logits using the specified **Model**.
3. **Post-Processing Output:** Converts the logits into appropriate output. For sentiment analysis, the output will be class labels. 

# Steps that goes behind the pipeline!

## Input Pre-processing

1. Splitting the input text into tokens (words, sub-words, or symbols).
2. Mapping each token to integer from vocab.

In [6]:
from transformers import AutoTokenizer

In [7]:
tokenizer =  AutoTokenizer.from_pretrained(model_name)

In [8]:
input = tokenizer(data, padding=True, truncation=True, return_tensors="pt")

In [9]:
input

{'input_ids': tensor([[  101,  8318, 27605, 26379,  9910,  1053,  2475, 12031, 18827,  1024,
          2265,  1997,  3997,  1012,  2204,  6599,  3930,  1998, 24501, 18622,
          4765,  7785,  2836,   102,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  1996,  2194, 24273,  6519, 23743,  5603,  2015,  2000,  2022,
          2062,  8793,  1999,  1053,  2509,  1998,  2009,  2003, 14669,  2000,
          1037,  2200,  5410,  4284,  1010,  2007,  6599,  6689,  2090,  1015,
          1012,  1019,  3867,  1998,  1017,  1012,  1019,  3867,   102,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0],
        [  101, 15745,  3286, 13252,  2003,  2036,  2019, 14316,  1999, 17410,
         11382,  8791,  1010,  2028,  1997,  2634,  1521,  1055,  7915,  1011,
          3652,  3541, 10346, 15007,

## Generating output from the model

In [28]:
from transformers import AutoModelForSequenceClassification

In [29]:
model = AutoModelForSequenceClassification.from_pretrained(model_name)

In [30]:
output = model(**input)

In [31]:
output

SequenceClassifierOutput(loss=None, logits=tensor([[-3.7354,  3.9795],
        [ 4.2851, -3.5077],
        [-3.0048,  3.1662]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

These are raw, unprocessed score outputted by the model.

## Post-processing output

In [32]:
import torch

In [34]:
probs = torch.nn.functional.softmax(output.logits)
probs

  probs = torch.nn.functional.softmax(output.logits)


tensor([[4.4594e-04, 9.9955e-01],
        [9.9959e-01, 4.1254e-04],
        [2.0849e-03, 9.9792e-01]], grad_fn=<SoftmaxBackward0>)

In [35]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}

In [46]:
preds = list(map(lambda x: model.config.id2label[x], torch.argmax(probs, axis=1).tolist()))
preds

['POSITIVE', 'NEGATIVE', 'POSITIVE']

In [47]:
max_probs = torch.max(probs, axis=1).values.tolist()
max_probs

[0.9995540976524353, 0.9995874762535095, 0.9979150891304016]

In [50]:
predictions = []
for l, p in zip(preds, max_probs):
    predictions.append(
        {
            'label':l,
            'score':p
        }
    )

In [52]:
predictions

[{'label': 'POSITIVE', 'score': 0.9995540976524353},
 {'label': 'NEGATIVE', 'score': 0.9995874762535095},
 {'label': 'POSITIVE', 'score': 0.9979150891304016}]