# Lexical Sentiment Analysis using SentiStrength

Sentiment analysis is a powerful technique that allows us to automatically understand the opinions, emotions, and attitudes expressed in written text. This tutorial will guide you through the process of conducting a robust sentiment analysis on social media data, specifically focusing on Twitter data.


We'll begin by introducing sentiment analysis, its origins, and its importance in today's digital landscape. Sentiment analysis, also known as opinion mining or emotion analysis, is the automated process of interpreting and classifying the underlying sentiments or emotions in text data. This technique has its roots in computer science but has since been widely adopted across various disciplines, including management, social sciences, and linguistics.


The advent of social media has significantly increased the value and relevance of sentiment analysis. Platforms like Twitter have become powerful channels for individuals to express their opinions, emotions, and sentiments on a wide range of topics. Leveraging this wealth of user-generated content requires constant monitoring and analysis, making sentiment analysis an invaluable tool for businesses, governments, and researchers alike.


In this tutorial, we'll follow a structured sentiment analysis process that covers the essential steps from topic identification to data visualization. We'll start by defining our research question and identifying the relevant data source (in this case, Twitter). Next, we'll discuss techniques for data collection, cleaning, and preprocessing to ensure high-quality input for our analysis.


Once our data is ready, we'll introduce a valuable tool called SentiStrength, a widely used and well-established sentiment analysis library. SentiStrength has been employed by researchers across various domains, and its effectiveness has been demonstrated in numerous scholarly publications.


After applying SentiStrength to our Twitter data, we'll explore techniques for visualizing and interpreting the sentiment analysis results. This will involve creating insightful visualizations that can effectively communicate the key findings and insights derived from the analysis.


Throughout the tutorial, we'll address common challenges and considerations in sentiment analysis, such as handling irony, sarcasm, and implicit sentiment cues. We'll also discuss best practices for ensuring the accuracy and reliability of our analysis.


By the end of this tutorial, you'll have a solid understanding of the sentiment analysis process and the skills necessary to conduct high-quality sentiment analysis on social media data, particularly Twitter data. Let's dive in!

## Tweets Sentiment analysis (Sentistrengh )

SentiStrength is a powerful sentiment analysis tool that is freely available for academic research purposes. It can be accessed online through a live demo or downloaded (for Windows only) from the official website at http://sentistrength.wlv.ac.uk.


At its core, SentiStrength is a lexicon-based sentiment classifier that compares social media text against a predefined lexicon of sentiment-bearing words and phrases. The program assigns sentiment scores ranging from -5 to +5, with positive numbers indicating favorable attitudes and negative numbers indicating negative attitudes. This approach is inspired by psychological research suggesting that human emotions can simultaneously exhibit both positive and negative sentiments, commonly known as mixed emotions.


One of the key strengths of SentiStrength is its ability to provide separate sentiment scores for each word within a sentence, allowing for a more granular analysis of the overall sentiment strength. The program's lexicon comprises 1,125 words and 1,364 word stems, each with an associated positive or negative sentiment score. For example, the word "ailing" has a score of -3 in the lexicon, suggesting a moderate negative sentiment.


SentiStrength employs a range of sophisticated techniques to enhance its sentiment analysis capabilities. It accounts for negation, where positive terms preceded by negating words (e.g., "not," "don't") have their sentiment flipped, and negative terms are neutralized. Additionally, the program considers booster words like "very" and "extremely," which can amplify the sentiment strength of the following word.


The tool also incorporates rules for handling questions, idioms, spelling corrections, and punctuation, as well as rules specific to computer-mediated communication methods of expressing sentiment, such as emoticons. SentiStrength maintains a list of emoticons with associated sentiment strength scores, further enhancing its ability to accurately interpret sentiment in social media text.


One of the notable advantages of SentiStrength is its speed and transparency. It can process up to 14,000 tweets per second on a standard PC and provides insights into how its scores were calculated. Additionally, SentiStrength supports multiple languages, making it a versatile tool for sentiment analysis across various linguistic contexts.


In [7]:
#! pip install sentistrength
#! pip install pandas

In [23]:
from sentistrength import PySentiStr
senti = PySentiStr()

In [31]:
senti.setSentiStrengthPath('jar_datei/SentiStrength.jar') # Note: Provide absolute path instead of relative path
senti.setSentiStrengthLanguageFolderPath('SentiStrengthData/') # Note: Provide absolute path instead of relative path

In [32]:
#import pysenti

s = senti.getSentiment('What a horrible terrible day', score='dual')
# SentiResult(positive=2, negative=-1, neutral=1)

In [33]:

result = senti.getSentiment('What a lovely positive day', score='dual')
print(result)

[(3, -1)]


In [34]:
str(result[0][0]) + ' ' + str(result[0][1])

'3 -1'

In [35]:
def get_seniment(text):
    result = senti.getSentiment(text, score='dual')
    return str(result[0][0]) + ' ' + str(result[0][1])

Example use single string

In [36]:
result1 = senti.getSentiment('What a lovely day')
print(result1)


[1]


Example use (List of string or panda series

In [38]:
from sentistrength import PySentiStr
senti = PySentiStr()
senti.setSentiStrengthPath('jar_datei/SentiStrength.jar') # Note: Provide absolute path instead of relative path
senti.setSentiStrengthLanguageFolderPath('SentiStrengthData/') # Note: Provide absolute path instead of relative path
str_arr = ['What a lovely day', 'What a bad day']
result = senti.getSentiment(str_arr, score='scale')
print(result)

# OR, if you want dual scoring (a score each for positive rating and negative rating)
result2 = senti.getSentiment(str_arr, score='dual')
print(result2)

# OR, if you want binary scoring (1 for positive sentence, -1 for negative sentence)
result2 = senti.getSentiment(str_arr, score='binary')
print(result2)

# OR, if you want trinary scoring (a score each for positive rating, negative rating and neutral rating)
result2 = senti.getSentiment(str_arr, score='trinary')
print(result2)


[1, -1]
[(2, -1), (1, -2)]
[1, -1]
[(2, -1, 1), (1, -2, -1)]


In [40]:
! pip freeze > requirements.txt