# Text Sentiment Analysis



## Manually creating a model for text sentiment Analysis (Bag of Words Vectorization)

In [25]:
from sklearn.feature_extraction.text import CountVectorizer
from nltk.tokenize import RegexpTokenizer

In [26]:
import pandas as pd
data = pd.read_csv('data.csv')

In [27]:
data.head()

Unnamed: 0,Sentence,Sentiment
0,Shell's $70 Billion BG Deal Meets Shareholder ...,negative
1,SSH COMMUNICATIONS SECURITY CORP STOCK EXCHANG...,negative
2,Kone 's net sales rose by some 14 % year-on-ye...,positive
3,The Stockmann department store will have a tot...,neutral
4,Circulation revenue has increased by 5 % in Fi...,positive


In [28]:
token = RegexpTokenizer(r'[a-zA-Z0-9]+')
cv = CountVectorizer(stop_words='english',ngram_range = (1,1),tokenizer = token.tokenize)
text_counts = cv.fit_transform(data['Sentence'])

In [29]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(text_counts, data['Sentiment'], test_size=0.25, random_state=5)

In [30]:
from sklearn.naive_bayes import MultinomialNB
MNB = MultinomialNB()
MNB.fit(X_train, Y_train)

MultinomialNB()

In [31]:
from sklearn import metrics
predicted = MNB.predict(X_test)
accuracy_score = metrics.accuracy_score(predicted, Y_test)
print("Accuracuy Score: ",accuracy_score)

Accuracuy Score:  0.6778615490061686


# Using Text Blob

In [32]:
from textblob import TextBlob

Polarity determines the sentiment of the text. Its values lie in [-1,1] where -1 denotes a highly negative sentiment and 1 denotes a highly positive sentiment.

Subjectivity determines whether a text input is factual information or a personal opinion. Its value lies between [0,1] where a value closer to 0 denotes a piece of factual information and a value closer to 1 denotes a personal opinion.

In [33]:
text_1 = "AIDS is a good subject"
text_2 = "It was not a good idea."

In [34]:
p_1 = TextBlob(text_1).sentiment.polarity
p_2 = TextBlob(text_2).sentiment.polarity

In [35]:
print("Polarity of Text 1 is", p_1)
print("Polarity of Text 2 is", p_2)

Polarity of Text 1 is 0.26666666666666666
Polarity of Text 2 is -0.35


This tells us that the first sentence is highly positive and the second one is highly negatiave

In [36]:
s_1 = TextBlob(text_1).sentiment.subjectivity
s_2 = TextBlob(text_2).sentiment.subjectivity

In [37]:
print("Subjectivity of Text 1 is", s_1)
print("Subjectivity of Text 2 is", s_2)

Subjectivity of Text 1 is 0.4666666666666667
Subjectivity of Text 2 is 0.6000000000000001


The subjectivity tells us that both the sentences are personal opinions

In [38]:
text_3 = "Mumbai is in Maharashtra"
blob1 = TextBlob(text_3)
blob1.sentiment

Sentiment(polarity=0.0, subjectivity=0.0)

Here the sentence is neutral and since it is facual information the subjectivity is zero.

## Using VADER (Valence Aware Dictionary and sEntiment Reasoner)

In [39]:
!pip install vaderSentiment --quiet

In [40]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
sentiment = SentimentIntensityAnalyzer()

In [41]:
sent_1 = sentiment.polarity_scores(text_1)
sent_2 = sentiment.polarity_scores(text_2)
print("Sentiment of text 1:", sent_1)
print("Sentiment of text 2:", sent_2)

Sentiment of text 1: {'neg': 0.0, 'neu': 0.58, 'pos': 0.42, 'compound': 0.4404}
Sentiment of text 2: {'neg': 0.325, 'neu': 0.675, 'pos': 0.0, 'compound': -0.3412}


## Using Transformer-Based Models

In [42]:
!pip install transformers --quiet

In [43]:
import transformers

In [47]:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["The weather was awesome.", "My head is paining"]
sentiment_pipeline(data)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9998689889907837},
 {'label': 'NEGATIVE', 'score': 0.9986617565155029}]