## Sentiment Analysis

Sentiment analysis is a natural language processing (NLP) technique used to determine the sentiment of a piece of text. Sentiment analysis involves analyzing the text to determine whether it is positive, negative, or neutral. Sentiment analysis can be used to analyze social media posts, customer reviews, and other types of user-generated content.

Sentiment analysis is a natural language processing (NLP) technique used to identify and categorize emotions or opinions expressed in text. It is commonly applied to determine whether a piece of writing—such as a review, tweet, or customer feedback—is positive, negative, or neutral. By analyzing linguistic features like word choice, tone, and context, sentiment analysis enables organizations to understand public perception, monitor brand reputation, and gain insights from large volumes of textual data. Techniques range from rule-based models using lexicons to advanced machine learning and deep learning approaches that capture complex patterns and context.

## 1. SentiStrength

## 2. VADER

Vader (Valence Aware Dictionary and sEntiment Reasoner) is a rule-based sentiment analysis tool that is specifically designed for analyzing social media texts. Vader is a pre-trained sentiment analysis model that provides a sentiment score for a given text.

Vader uses a dictionary of words and rules to determine the sentiment of a piece of text. It uses a valence score for each word to determine its positivity or negativity. The valence score ranges from -4 to +4, with -4 being the most negative and +4 being the most positive.

Vader also takes into account the intensity of the sentiment, which can be determined by capitalization and punctuation. For example, all caps or exclamation marks can indicate a stronger sentiment.


## 3. TextBlob

TextBlob is a Python library used for Natural Language Processing (NLP). It relies on NLTK (Natural Language Toolkit). When you give it a sentence, it gives back two things: polarity and subjectivity.

The polarity score ranges from -1 to 1. A score of -1 means the words are super negative, like “disgusting” or “awful.” A score of 1 means the words are super positive, like “excellent” or “best.”

Subjectivity score, on the other hand, goes from 0 to 1. If it’s close to 1, it means the sentence has a lot of personal opinion instead of just facts.

For my project, I was mostly interested in the polarity score because I wasn’t focusing on facts. TextBlob can do a lot of other things too, like figuring out noun phrases, tagging parts of speech, breaking down words, and more. So, I didn’t use the subjectivity score in my project.

## 4. SentiWordNet

# Lexical Sentiment Analysis using SentiStrength

Sentiment analysis is a powerful technique that allows us to automatically understand the opinions, emotions, and attitudes expressed in written text. This tutorial will guide you through the process of conducting a robust sentiment analysis on social media data, specifically focusing on Twitter data.


We'll begin by introducing sentiment analysis, its origins, and its importance in today's digital landscape. Sentiment analysis, also known as opinion mining or emotion analysis, is the automated process of interpreting and classifying the underlying sentiments or emotions in text data. This technique has its roots in computer science but has since been widely adopted across various disciplines, including management, social sciences, and linguistics.


The advent of social media has significantly increased the value and relevance of sentiment analysis. Platforms like Twitter have become powerful channels for individuals to express their opinions, emotions, and sentiments on a wide range of topics. Leveraging this wealth of user-generated content requires constant monitoring and analysis, making sentiment analysis an invaluable tool for businesses, governments, and researchers alike.


In this tutorial, we'll follow a structured sentiment analysis process that covers the essential steps from topic identification to data visualization. We'll start by defining our research question and identifying the relevant data source (in this case, Twitter). Next, we'll discuss techniques for data collection, cleaning, and preprocessing to ensure high-quality input for our analysis.


Once our data is ready, we'll introduce a valuable tool called SentiStrength, a widely used and well-established sentiment analysis library. SentiStrength has been employed by researchers across various domains, and its effectiveness has been demonstrated in numerous scholarly publications.


After applying SentiStrength to our Twitter data, we'll explore techniques for visualizing and interpreting the sentiment analysis results. This will involve creating insightful visualizations that can effectively communicate the key findings and insights derived from the analysis.


Throughout the tutorial, we'll address common challenges and considerations in sentiment analysis, such as handling irony, sarcasm, and implicit sentiment cues. We'll also discuss best practices for ensuring the accuracy and reliability of our analysis.


By the end of this tutorial, you'll have a solid understanding of the sentiment analysis process and the skills necessary to conduct high-quality sentiment analysis on social media data, particularly Twitter data. Let's dive in!

## Tweets Sentiment analysis (Sentistrengh )

SentiStrength is a powerful sentiment analysis tool that is freely available for academic research purposes. It can be accessed online through a live demo or downloaded (for Windows only) from the official website at http://sentistrength.wlv.ac.uk.


At its core, SentiStrength is a lexicon-based sentiment classifier that compares social media text against a predefined lexicon of sentiment-bearing words and phrases. The program assigns sentiment scores ranging from -5 to +5, with positive numbers indicating favorable attitudes and negative numbers indicating negative attitudes. This approach is inspired by psychological research suggesting that human emotions can simultaneously exhibit both positive and negative sentiments, commonly known as mixed emotions.


One of the key strengths of SentiStrength is its ability to provide separate sentiment scores for each word within a sentence, allowing for a more granular analysis of the overall sentiment strength. The program's lexicon comprises 1,125 words and 1,364 word stems, each with an associated positive or negative sentiment score. For example, the word "ailing" has a score of -3 in the lexicon, suggesting a moderate negative sentiment.


SentiStrength employs a range of sophisticated techniques to enhance its sentiment analysis capabilities. It accounts for negation, where positive terms preceded by negating words (e.g., "not," "don't") have their sentiment flipped, and negative terms are neutralized. Additionally, the program considers booster words like "very" and "extremely," which can amplify the sentiment strength of the following word.


The tool also incorporates rules for handling questions, idioms, spelling corrections, and punctuation, as well as rules specific to computer-mediated communication methods of expressing sentiment, such as emoticons. SentiStrength maintains a list of emoticons with associated sentiment strength scores, further enhancing its ability to accurately interpret sentiment in social media text.


One of the notable advantages of SentiStrength is its speed and transparency. It can process up to 14,000 tweets per second on a standard PC and provides insights into how its scores were calculated. Additionally, SentiStrength supports multiple languages, making it a versatile tool for sentiment analysis across various linguistic contexts.


| Feature                   | **SentiStrength**                                    | **VADER**                                     | **TextBlob**                              | **SentiWordNet**                                |
| ------------------------- | ---------------------------------------------------- | --------------------------------------------- | ----------------------------------------- | ----------------------------------------------- |
| **Type**                  | Lexicon + Rule-based                                 | Lexicon + Rule-based (social-media optimized) | Lexicon + Rule-based                      | Lexicon-based (WordNet sentiment scores)        |
| **Output Format**         | Two scores: +1 to +5 (positive), -1 to -5 (negative) | Compound (-1 to 1), with pos/neu/neg scores   | Polarity (-1 to 1), Subjectivity (0 to 1) | Positive, Negative, Objective scores (0 to 1)   |
| **Language Support**      | English                                              | English                                       | English                                   | English (via WordNet)                           |
| **Handles Negation**      | Yes                                                  | Yes                                           | Basic                                     | No (relies on word-level sentiment)             |
| **Handles Emojis/Slang**  | Limited                                              | Very good                                     | Poor                                      | None                                            |
| **Context Awareness**     | Limited                                              | Limited                                       | Very limited                              | None (word-level only)                          |
| **Customizability**       | Yes (custom lexicons)                                | Limited                                       | Limited                                   | Moderate (can modify sentiment scores manually) |
| **Ease of Use in Python** | Requires Java or wrapper (`senti` or `senticlass`)   | Very easy (`nltk.sentiment`)                  | Very easy (`textblob` package)            | Moderate (`nltk.corpus.sentiwordnet`)           |
| **Use Cases**             | Short texts, informal text, social media             | Social media, tweets, short texts             | General-purpose text analysis             | Word-level sentiment scoring, academic research |
| **License**               | Free for academic use                                | MIT                                           | MIT                                       | Open (WordNet-compatible license)               |


In [7]:
#! pip install sentistrength
#! pip install pandas

In [23]:
from sentistrength import PySentiStr
senti = PySentiStr()

In [31]:
senti.setSentiStrengthPath('jar_datei/SentiStrength.jar') # Note: Provide absolute path instead of relative path
senti.setSentiStrengthLanguageFolderPath('SentiStrengthData/') # Note: Provide absolute path instead of relative path

In [32]:
#import pysenti

s = senti.getSentiment('What a horrible terrible day', score='dual')
# SentiResult(positive=2, negative=-1, neutral=1)

In [33]:

result = senti.getSentiment('What a lovely positive day', score='dual')
print(result)

[(3, -1)]


In [34]:
str(result[0][0]) + ' ' + str(result[0][1])

'3 -1'

In [35]:
def get_seniment(text):
    result = senti.getSentiment(text, score='dual')
    return str(result[0][0]) + ' ' + str(result[0][1])

Example use single string

In [36]:
result1 = senti.getSentiment('What a lovely day')
print(result1)


[1]


Example use (List of string or panda series

In [38]:
from sentistrength import PySentiStr
senti = PySentiStr()
senti.setSentiStrengthPath('jar_datei/SentiStrength.jar') # Note: Provide absolute path instead of relative path
senti.setSentiStrengthLanguageFolderPath('SentiStrengthData/') # Note: Provide absolute path instead of relative path
str_arr = ['What a lovely day', 'What a bad day']
result = senti.getSentiment(str_arr, score='scale')
print(result)

# OR, if you want dual scoring (a score each for positive rating and negative rating)
result2 = senti.getSentiment(str_arr, score='dual')
print(result2)

# OR, if you want binary scoring (1 for positive sentence, -1 for negative sentence)
result2 = senti.getSentiment(str_arr, score='binary')
print(result2)

# OR, if you want trinary scoring (a score each for positive rating, negative rating and neutral rating)
result2 = senti.getSentiment(str_arr, score='trinary')
print(result2)


[1, -1]
[(2, -1), (1, -2)]
[1, -1]
[(2, -1, 1), (1, -2, -1)]


In [40]:
! pip freeze > requirements.txt

In [7]:
import pandas as pd
df = pd.read_csv("util/SentiStrengthData_DE/_EmotionLookupTable_v5_SOURCE.csv")

ParserError: Error tokenizing data. C error: Expected 1 fields in line 728, saw 2


In [3]:
print(df.head())

                                                   ############\t\t\t\t#\t\t\t\t\t
# SentiStrength_DE Version: v5 (Oct.2011) by Ha...                 OFAI \t\t\t\t\t
#\t\t\t\t\t                                                                    NaN
# SentiStrength_DE is a collection of German le...                             NaN
# for sentiment classification together with th...                             NaN
# (Cf. http://sentistrength.wlv.ac.uk/ ).\t\t\t...                             NaN
