We will deploy our model in streamlit and then use it for sentiment analysis. 

How to use it: Please run **app.py** to run the streamlit app.

Before deployment lets go through our pipelines and manually check how our sentiment analysis models are working and do some quick comparison. 

In [5]:
import torch
from transformers import pipeline

In [6]:
# Check if GPU is available
device = 0 if torch.cuda.is_available() else -1

In [15]:
data = ["Oh my god, the actor is so amazing here", "I would've been able to understand everything in this complicated story - if I could've HEARD half of the dialogue. I watched parts of this film in three different environments, and even in the best one way too many important lines were simply unintelligble. Since this is apparently a conscious choice by the director, I doubt I'll ever watch another Nolan film.", "Some people can say this is too complicated moive and there is no feeling at all, after watching this. But didn't watch any moive that ends connection between two characters like this. So it is the best end I ever watched. Remember the how Vin diesel and Paul walker end their relationship by looking at each other and go different paths at a road junction. In this film it would be like Protagonist and neil meets at a same road in different direction, and neils know this is the end of their relationship and Protagonist knows he will again meet him at some distance.... Not only the end, the plot is superb. It's like I watch a half of a movie with one half my brain and other part of my brain knows what is going to happen in the other pat. In deeper sense..Neil and Protagonist is not the different characters. They are the reflections of a same person. The best moive I ever watched. Thank you Christopher Nolan.. I would love to see a series based on the world of Tenet." ]

1. First we will try our first version of model trained on 'imdb' dataset.

In [7]:
pipe_1 = pipeline("text-classification", model="tashrifmahmud/sentiment_analysis_model", device=device)

config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [16]:
pred_1 = pipe_1(data)
pred_1

[{'label': 'POSITIVE', 'score': 0.9567021131515503},
 {'label': 'NEGATIVE', 'score': 0.9698693752288818},
 {'label': 'POSITIVE', 'score': 0.9903952479362488}]

2. Now we will try our 2nd version, trained on 'imdb' and fine-tuned on 'rotten_tomatoes' model.

In [11]:
pipe_2 = pipeline("text-classification", model="tashrifmahmud/sentiment_analysis_model_v2", device=device)

config.json:   0%|          | 0.00/754 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

In [19]:
pred_2 = pipe_2(data)
pred_2

[{'label': 'POSITIVE', 'score': 0.9683966040611267},
 {'label': 'NEGATIVE', 'score': 0.9619046449661255},
 {'label': 'POSITIVE', 'score': 0.9843868613243103}]

3. Now we will try "distilbert/distilbert-base-uncased-finetuned-sst-2-english" which is a text classification model built for this task and is the most popular.

In [18]:
pipe_3 = pipeline("text-classification", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english", device=device)

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [20]:
pred_3 = pipe_3(data)
pred_3

[{'label': 'POSITIVE', 'score': 0.9998786449432373},
 {'label': 'NEGATIVE', 'score': 0.9978074431419373},
 {'label': 'POSITIVE', 'score': 0.9994733929634094}]

If we consider the results from our Logistic Regression model as well, our results show:

| Logistic Regression | Our Model 1 | Our Model 2 | HF Model 3 |
| :----------------  | :----------: | :----: | ----: |
| .7086              |   .9567   | .9684 | .9999 |
| .8844             |   .9699   | .9619 | .9978 |
| .7940              |  .9904    | .9844 | .9995 |


Without a doubt HF's pre-trained model performs better. Our model based on the pre-trained model also does comparably. But Logistic Regression cannot do this task well.

We will put our "tashrifmahmud/sentiment_analysis_model_v2" in streamlit app and use it.