# Sentiment Analysis

Sentiment Analysis, or Opinion Mining, is a sub-field of Natural Language Processing (NLP) that tries to identify and extract opinions within a given text. The aim of sentiment analysis is to gauge the attitude, sentiments, evaluations, attitudes and emotions of a speaker/writer based on the computational treatment of subjectivity in a text.

# Why is sentiment analysis so important?

Businesses today are heavily dependent on data. Majority of this data however, is unstructured text coming from sources like emails, chats, social media, surveys, articles, and documents. The micro-blogging content coming from Twitter and Facebook poses serious challenges, not only because of the amount of data involved, but also because of the kind of language used in them to express sentiments, i.e., short forms, memes and emoticons.

Sentiment Analysis is also useful for practitioners and researchers, especially in fields like sociology, marketing, advertising, psychology, economics, and political science, which rely a lot on human-computer interaction data.

Sentiment Analysis enables companies to make sense out of data by being able to automate this entire process! Thus they are able to elicit vital insights from a vast unstructured dataset without having to manually indulge with it.

# Why is Sentiment Analysis a Hard to perform Task?

Though it may seem easy on paper, Sentiment Analysis is actually a tricky subject. There are various reasons for that:

Understanding emotions through text are not always easy. Sometimes even humans can get misled, so expecting a 100% accuracy from a computer is like asking for the Moon!
A text may contain multiple sentiments all at once. For instance,
“The intent behind the movie was great, but it could have been better”.

The above sentence consists of two polarities, i.e., Positive as well as Negative. So how do we conclude whether the review was Positive or Negative?

Computers aren’t too comfortable in comprehending Figurative Speech. Figurative language uses words in a way that deviates from their conventionally accepted definitions in order to convey a more complicated meaning or heightened effect. Use of similes, metaphors, hyperboles etc qualify for a figurative speech. Let us understand it better with an example.
“The best I can say about the movie is that it was interesting.”
Here, the word ’interesting’ does not necessarily convey positive sentiment and can be confusing for algorithms.

Heavy use of emoticons and slangs with sentiment values in social media texts like that of Twitter and Facebook also makes text analysis difficult. For example a “ :)” denotes a smiley and generally refers to positive sentiment while “:(” denotes a negative sentiment on the other hand. Also, acronyms like “LOL“, ”OMG” and commonly used slangs like “Nah”, “meh”, ”giggly” etc are also strong indicators of some sort of sentiment in a sentence.

# About TextBlob.
TextBlob is a python library and offers a simple API to access its methods and perform basic NLP tasks. 

A good thing about TextBlob is that they are just like python strings. So, you can transform and play with it same like we did in python. Below, I have shown you below some basic tasks. Don’t worry about the syntax, it is just to give you an intuition about how much-related TextBlob is to Python strings.

# Environment Setup
Installation of TextBlob in your system in a simple task, all you need to do is open anaconda prompt ( or terminal if using Mac OS or Ubuntu) and enter the following commands:

pip install -U textblob
This will install TextBlob. For the uninitiated – practical work in Natural Language Processing typically uses large bodies of linguistic data, or corpora. To download the necessary corpora, you can run the following command

python -m textblob.download_corpora

# Features
Noun phrase extraction
Part-of-speech tagging
Sentiment analysis
Classification (Naive Bayes, Decision Tree)
Language translation and detection powered by Google Translate
Tokenization (splitting text into words and sentences)
Word and phrase frequencies
Parsing
n-grams
Word inflection (pluralization and singularization) and lemmatization
Spelling correction
Add new models or languages through extensions
WordNet integration

# NLP tasks using TextBlob
Tokenization
Tokenization refers to dividing text or a sentence into a sequence of tokens, which roughly correspond to “words”. This is one of the basic tasks of NLP. To do this using TextBlob, follow the two steps:

Create a textblob object and pass a string with it.
Call functions of textblob in order to do a specific task.

In [14]:
from textblob import TextBlob

blob = TextBlob("GreyAtom is a great platform to learn data science. \n It helps community through blogs, hackathons, discussions,etc.")

In [15]:
blob.sentences

[Sentence("GreyAtom is a great platform to learn data science."),
 Sentence("It helps community through blogs, hackathons, discussions,etc.")]

# Noun Phrase Extraction
Since we extracted the words in the previous section, instead of that we can just extract out the noun phrases from the textblob. Noun Phrase extraction is particularly important when you want to analyze the “who” in a sentence. Lets see an example below.

In [16]:
blob = TextBlob("GreyAtom is a great platform to learn data science.")
for np in blob.noun_phrases:
     print (np)

greyatom
great platform
data science


# Part-of-speech Tagging
Part-of-speech tagging or grammatical tagging is a method to mark words present in a text on the basis of its definition and context. In simple words, it tells whether a word is a noun, or an adjective, or a verb, etc. This is just a complete version of noun phrase extraction, where we want to find all the the parts of speech in a sentence.

In [17]:
for words, tag in blob.tags:
    print (words, tag)

GreyAtom NNP
is VBZ
a DT
great JJ
platform NN
to TO
learn VB
data NNS
science NN


# Words Inflection and Lemmatization
Inflection is a process of word formation in which characters are added to the base form of a word to express grammatical meanings. Word inflection in TextBlob is very simple, i.e., the words we tokenized from a textblob can be easily changed into singular or plural.

In [18]:
blob = TextBlob("GreyAtom is a great platform to learn data science. \n It helps community through blogs, hackathons, discussions,etc.")
print (blob.sentences[1].words[1])
print (blob.sentences[1].words[1].singularize())

helps
help


In [19]:
from textblob import Word
w = Word('Platform')
w.pluralize()

'Platforms'

In [20]:
## lemmatization
w = Word('running')
w.lemmatize("v") ## v here represents verb

'run'

# N-grams
A combination of multiple words together are called N-Grams. N grams (N > 1) are generally more informative as compared to words, and can be used as features for language modelling.  N-grams can be easily accessed in TextBlob using the ngrams function, which returns a tuple of n successive words.

In [21]:
for ngram in blob.ngrams(2):
    print (ngram)

['GreyAtom', 'is']
['is', 'a']
['a', 'great']
['great', 'platform']
['platform', 'to']
['to', 'learn']
['learn', 'data']
['data', 'science']
['science', 'It']
['It', 'helps']
['helps', 'community']
['community', 'through']
['through', 'blogs']
['blogs', 'hackathons']
['hackathons', 'discussions']
['discussions', 'etc']


# Sentiment Analysis
Sentiment analysis is basically the process of determining the attitude or the emotion of the writer, i.e., whether it is positive or negative or neutral.

The sentiment function of textblob returns two properties, polarity, and subjectivity.

Polarity is float which lies in the range of [-1,1] where 1 means positive statement and -1 means a negative statement. Subjective sentences generally refer to personal opinion, emotion or judgment whereas objective refers to factual information. Subjectivity is also a float which lies in the range of [0,1].

In [22]:
print (blob)
blob.sentiment

GreyAtom is a great platform to learn data science. 
 It helps community through blogs, hackathons, discussions,etc.


Sentiment(polarity=0.8, subjectivity=0.75)

# VADER Sentiment Analysis

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. VADER uses a combination of A sentiment lexicon is a list of lexical features (e.g., words) which are generally labelled according to their semantic orientation as either positive or negative.

VADER has been found to be quite successful when dealing with social media texts, NY Times editorials, movie reviews, and product reviews. This is because VADER not only tells about the Positivity and Negativity score but also tells us about how positive or negative a sentiment is.

# Advantages of using VADER

VADER has a lot of advantages over traditional methods of Sentiment Analysis, including:

It works exceedingly well on social media type text, yet readily generalizes to multiple domains
It doesn’t require any training data but is constructed from a generalizable, valence-based, human-curated gold standard sentiment lexicon
It is fast enough to be used online with streaming data, and
It does not severely suffer from a speed-performance tradeoff.

In [25]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()

def sentiment_analyzer_scores(sentence):
    score = analyser.polarity_scores(sentence)
    print("{:-<40} {}".format(sentence, str(score)))

In [31]:
sentiment_analyzer_scores("Machine Learning is nice")
sentiment_analyzer_scores("It is cool to learn Data Science")
sentiment_analyzer_scores("The movie was terrible")


Machine Learning is nice---------------- {'neg': 0.0, 'neu': 0.517, 'pos': 0.483, 'compound': 0.4215}
It is cool to learn Data Science-------- {'neg': 0.0, 'neu': 0.723, 'pos': 0.277, 'compound': 0.3182}
The movie was terrible------------------ {'neg': 0.508, 'neu': 0.492, 'pos': 0.0, 'compound': -0.4767}


# VADER analyses sentiments primarily based on certain key points:

# Punctuation
The use of an exclamation mark(!), increases the magnitude of the intensity without modifying the semantic orientation. For example, “The food here is good!” is more intense than “The food here is good.” and an increase in the number of (!), increases the magnitude accordingly.

In [34]:
sentiment_analyzer_scores("The food here is good")
sentiment_analyzer_scores("The food here is good!")
sentiment_analyzer_scores("The food here is good!!")
sentiment_analyzer_scores("The food here is good!!!")

The food here is good------------------- {'neg': 0.0, 'neu': 0.58, 'pos': 0.42, 'compound': 0.4404}
The food here is good!------------------ {'neg': 0.0, 'neu': 0.556, 'pos': 0.444, 'compound': 0.4926}
The food here is good!!----------------- {'neg': 0.0, 'neu': 0.534, 'pos': 0.466, 'compound': 0.5399}
The food here is good!!!---------------- {'neg': 0.0, 'neu': 0.514, 'pos': 0.486, 'compound': 0.5826}


# Capitalization 
Using upper case letters to emphasize a sentiment-relevant word in the presence of other non-capitalized words, increases the magnitude of the sentiment intensity. For example, “The food here is GREAT!” conveys more intensity than “The food here is great!”

In [35]:
sentiment_analyzer_scores("The food here is great")
sentiment_analyzer_scores("The food here is GREAT")

The food here is great------------------ {'neg': 0.0, 'neu': 0.494, 'pos': 0.506, 'compound': 0.6249}
The food here is GREAT------------------ {'neg': 0.0, 'neu': 0.453, 'pos': 0.547, 'compound': 0.7034}


# Degree modifiers
Also called intensifiers, they impact the sentiment intensity by either increasing or decreasing the intensity. For example, “The service here is extremely good” is more intense than “The service here is good”, whereas “The service here is marginally good” reduces the intensity.

In [38]:
sentiment_analyzer_scores("The service here is extremely good")
sentiment_analyzer_scores("The service here is marginally good")

The service here is extremely good------ {'neg': 0.0, 'neu': 0.61, 'pos': 0.39, 'compound': 0.4927}
The service here is marginally good----- {'neg': 0.0, 'neu': 0.657, 'pos': 0.343, 'compound': 0.3832}


# Conjunctions
Use of conjunctions like “but” signals a shift in sentiment polarity, with the sentiment of the text following the conjunction being dominant. “The food here is great, but the service is horrible” has mixed sentiment, with the latter half dictating the overall rating.

In [39]:
sentiment_analyzer_scores("The food here is great, but the service is horrible")

The food here is great, but the service is horrible {'neg': 0.31, 'neu': 0.523, 'pos': 0.167, 'compound': -0.4939}


# Handling Emojis, Slangs and Emoticons
VADER performs very well with emojis, slangs and acronyms in sentences. Let us see each with an example.

In [40]:
print(sentiment_analyzer_scores('I am 😄 today'))
print(sentiment_analyzer_scores('😊'))
print(sentiment_analyzer_scores('😥'))
print(sentiment_analyzer_scores('☹️'))

I am 😄 today---------------------------- {'neg': 0.0, 'neu': 0.476, 'pos': 0.524, 'compound': 0.6705}
None
😊--------------------------------------- {'neg': 0.0, 'neu': 0.333, 'pos': 0.667, 'compound': 0.7184}
None
😥--------------------------------------- {'neg': 0.275, 'neu': 0.268, 'pos': 0.456, 'compound': 0.3291}
None
☹️-------------------------------------- {'neg': 0.706, 'neu': 0.294, 'pos': 0.0, 'compound': -0.34}
None


# Slangs


In [41]:
print(sentiment_analyzer_scores("Today SUX!"))
print(sentiment_analyzer_scores("Today only kinda sux! But I'll get by, lol"))

Today SUX!------------------------------ {'neg': 0.779, 'neu': 0.221, 'pos': 0.0, 'compound': -0.5461}
None
Today only kinda sux! But I'll get by, lol {'neg': 0.127, 'neu': 0.556, 'pos': 0.317, 'compound': 0.5249}
None


# Emoticons


In [42]:
print(sentiment_analyzer_scores("Make sure you :) or :D today!"))

Make sure you :) or :D today!----------- {'neg': 0.0, 'neu': 0.294, 'pos': 0.706, 'compound': 0.8633}
None


<word form="great" cornetto_synset_id="n_a-525317" wordnet_id="a-01123879" pos="JJ" sense="very good" polarity="1.0" subjectivity="1.0" intensity="1.0" confidence="0.9" />
<word form="great" wordnet_id="a-01278818" pos="JJ" sense="of major significance or importance" polarity="1.0" subjectivity="1.0" intensity="1.0" confidence="0.9" />
<word form="great" wordnet_id="a-01386883" pos="JJ" sense="relatively large in size or number or extent" polarity="0.4" subjectivity="0.2" intensity="1.0" confidence="0.9" />
<word form="great" wordnet_id="a-01677433" pos="JJ" sense="remarkable or out of the ordinary in degree or magnitude or effect" polarity="0.8" subjectivity="0.8" intensity="1.0" confidence="0.9" />