# Twitter sentiment analysis using NLP techniques
### NLP for sentiment analysis of tweets: demo for using the most popular libraries

#### It's not any text data, though. I'll be using a dataset of tweets, collected for a period of three weeks around the 2020 US presidential elections.
#### DataLink :- https://www.kaggle.com/datasets/manchunhui/us-election-2020-tweets/data
#### I downloaded the dataset directly from the link above


# Optimizing Political Campaigns with Twitter Sentiment Analysis:

**Step-by-Step Approach**

First, I conduct classic exploratory data analysis to understand basic questions like the number of tweets and the date range. Next, I perform text pre-processing to clean the data by removing stop words and converting words to their base forms (e.g., plurals to singular).

I then carry out sentiment analysis using three libraries: TextBlob, VADER, and Flair, and compare their results to determine which one best suits our dataset.

Given the context of a political campaign, I aim to derive actionable insights from the sentiment analysis results. The goal is to use Twitter sentiment analysis to find strategies to increase our candidate's voter base.

<html><center><h1> Table of contents</h1>

Exploratory Data Analysis

Text pre-processing

Intro to sentiment analysis

Sentiment analysis with TextBlob

Sentiment analysis with VADER

Sentiment analysis with Flair

Which is the best sentiment analysis library ?

Actionable insights from sentiment analysis of tweets

</center></html>

# New Section

In [5]:
# Exploratory Data Analysis

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import time

In [None]:
# stopwords, tokenizer, stemmer
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

nltk.download('averaged_perceptron_tagger')
nltk.download('vader_lexicon')
nltk.download('stopwords')
nltk.download('wordnet')

import re # regular expressions

# !pip install gensim
import gensim
from gensim.parsing.preprocessing import remove_stopwords # we also use gensim for stopwords removal

from textblob import TextBlob



[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!




In [None]:
# !pip install flair
from flair.data import Sentence
from flair.nn import Classifier

# make a sentence
sentence = Sentence('I love Berlin .')

# load the NER tagger
tagger = Classifier.load('sentiment')

# run NER over sentence
tagger.predict(sentence)

# print the sentence with all annotations
print(sentence)
from segtok.segmenter import split_single


In [None]:
# Loading each dataset
trump_df = pd.read_csv('/content/hashtag_donaldtrump .csv', lineterminator='\n')
biden_df = pd.read_csv('/content/hashtag_joebiden .csv', lineterminator='\n')

In [None]:
print('Total number of records in Trump dataset: ', trump_df.shape)
print('Total number of records in Biden dataset: ', biden_df.shape)

In [None]:
trump_df.columns

In [None]:
biden_df.columns

In [None]:
trump_df.sample(5)

In [None]:
# lets get overall idea about the data by using profile report fearture from pandas.
!pip install ydata-profiling

import numpy as np
import pandas as pd
from ydata_profiling import ProfileReport

In [None]:
trump_profile_report = ProfileReport(trump_df, title="Profiling Report")
biden_profile_report = ProfileReport(biden_df, title="Profiling Report")

In [None]:
trump_profile_report.to_notebook_iframe()

In [None]:
# to view as html file.
biden_profile_report.to_file("biden_profile_report.html")

#### Observation:- By using this wonderful tool Profilling we got this details data understanding with just few lines of code.

In [None]:
# Remove unneeded columns

irrelevant_columns = ['source','user_name','user_screen_name','user_description','user_join_date','collected_at']

# dump this all unwanted features.

trump_df = trump_df.drop(columns=irrelevant_columns)
biden_df = biden_df.drop(columns=irrelevant_columns)

In [None]:
# check for missing values.
trump_df.isnull().mean()

In [None]:
biden_df.isnull().mean()

In [None]:
# lets drop these for now.

trump_df = trump_df.dropna()
biden_df = biden_df.dropna()

In [None]:
biden_df.country.unique()

#### As we don't need out country for the voting on election. We will drop all of them and only focus on USA.


In [None]:
trump_usa_df = trump_df[trump_df.country == "United States of America"]
biden_usa_df = biden_df[biden_df.country == "United States of America"]

del trump_df
del biden_df

In [None]:
print('Total number of records in Trump USA dataset: ', trump_usa_df.shape)
print('Total number of records in Biden USA dataset: ', biden_usa_df.shape)

#### Duplicate tweets

In [None]:
tids = trump_usa_df.tweet_id
bids = biden_usa_df.tweet_id

ids_tweets_in_common = set(trump_usa_df.tweet_id).intersection(set(biden_usa_df.tweet_id))
len(ids_tweets_in_common)

Around 20.000 tweets show up in both dataset. I think it doesn't make sense for the same tweet to contribute to compute the sentiment towards Biden and towards Trump. Let's say one tweet has a negative emotion, do we know if it's negative towards both candidates or only towards one of them and just mentions the second ?
Let's have a look at a few of the 'duplicate' tweets.

In [None]:
pd.options.display.max_colwidth = 1000 #by default, Python will likely display only the first 50 characters from a long text

biden_usa_df.tweet.loc[biden_usa_df.tweet_id.isin(list(ids_tweets_in_common))].head(5)

From the tweets above, it looks like the duplicate tweets are not really helpful for a per candidate sentiment analysis, so I'll drop them.

In [None]:
trump_usa_unique_df = trump_usa_df[~trump_usa_df['tweet_id'].isin(ids_tweets_in_common)]
biden_usa_unique_df = biden_usa_df[~biden_usa_df['tweet_id'].isin(ids_tweets_in_common)]

In [None]:
print('Total number of unique records in Trump USA dataset: ', trump_usa_unique_df.shape)
print('Total number of unique records in Biden USA dataset: ', biden_usa_unique_df.shape)

In [None]:
trump_usa_unique_df

And the second observation is that we have Tweets in Neglish and Spanish too. Most Natural Language Processing libraries can only handle a single language. So we will keep only the tweets in English.

I tried several of the packages that can handle language detection which are mentioned in this :- https://stackoverflow.com/questions/39142778/how-to-determine-the-language-of-a-piece-of-text

In [None]:
!pip install langdetect

# testing
example_tweet = '#Wisconsin podría ser el punto de inflexión en la carrera entre #Trump y #Biden https://t.co/WFf8A1hAn7'

# !pip install langdetect

from langdetect import detect
from textblob import TextBlob

# Example tweet
example_tweet = "Bonjour tout le monde"

# Detect language using langdetect
detected_language = detect(example_tweet)
print(f'Language of text { example_tweet} is: {detected_language}')

# Proceed with TextBlob analysis
b = TextBlob(example_tweet)
print(f'Sentiment: {b.sentiment}')



In [None]:
#try out langdetect on a sample tweet
from langdetect import detect, DetectorFactory
DetectorFactory.seed = 0
detect("#Wisconsin podría ser el punto de inflexión en la carrera entre #Trump y #Biden https://t.co/WFf8A1hAn7")

In [None]:
from langdetect import detect, DetectorFactory
DetectorFactory.seed = 0

def get_language(tweet):
    try:
        lang=detect(tweet)
    except:
        lang='no'
        # for some tweets, detect will throw an error.
        # uncomment the line below if you want to look further into this behavior
        #print("This tweet throws an error:", tweet)
    return lang

From my initial tests with TextBlob I saw that language detection will take a long time. Let's first try it on 1.000 records to get an idea of how long we'll have to wait for our full datasets language analysis.

In [None]:
import time

start_time = time.time()

test_df = trump_usa_unique_df.iloc[:1000].copy()
test_df['lang'] = test_df.tweet.apply(lambda x: get_language(x))

stop_time = time.time()

print(f'It took {np.around((time.time() - start_time),decimals=1)} seconds')

In [None]:
import time
start_time = time.time()

trump_usa_unique_df['lang'] = trump_usa_unique_df.tweet.apply(lambda x: get_language(x))

stop_time = time.time()
print(f'It took {np.around((time.time() - start_time), decimals=1)} seconds')

In [None]:
start_time = time.time()

biden_usa_unique_df['lang'] = biden_usa_unique_df.tweet.apply(lambda x: get_language(x))

stop_time = time.time()
print(f'It took {np.around((time.time() - start_time), decimals=1)} seconds')

In [None]:
biden_usa_unique_df.columns

In [None]:
plt.figure(figsize=(20,5))
ax = biden_usa_unique_df.lang.value_counts().plot.bar(rot=0)
plt.setp(ax.get_xticklabels(), fontsize=16)
plt.title('Frequency of languages in Biden tweets')
plt.show()

plt.figure(figsize=(20,5))
ax = trump_usa_unique_df.lang.value_counts().plot.bar(rot=0)
plt.setp(ax.get_xticklabels(), fontsize=16)
plt.title('Frequency of languages in Trump tweets')
plt.show()

#### We keep only the tweets in English.
First, I'm getting rid of the long names, as they're making it more difficult to follow along, rather than helping, now that they became quite long.

In [None]:
biden_df = biden_usa_unique_df.copy()
del biden_usa_unique_df

trump_df = trump_usa_unique_df.copy()
del trump_usa_unique_df

In [None]:
biden_df = biden_df[biden_df.lang == 'en']
trump_df = trump_df[trump_df.lang == 'en']

In [None]:
biden_df

In [None]:
print('Total number of records in Trump dataset: ', trump_df.shape)
print('Total number of records in Biden dataset: ', biden_df.shape)

In [None]:
trump_df

In [None]:
import pandas as pd
import numpy as np

# Sample data (assuming trump_df and biden_df are your DataFrames)
# Including a 'lang' column to demonstrate filtering by language
trump_data_2 = {'tweets': ["Trump tweet 1", "Trump tweet 2", "Trump tweet 3"], 'lang': ['en', 'es', 'en']}
biden_data_2 = {'tweets': ["Biden tweet 1", "Biden tweet 2", "Biden tweet 3"], 'lang': ['en', 'en', 'es']}

# Creating DataFrames
trump_df = pd.DataFrame(trump_data_2)
biden_df = pd.DataFrame(biden_data_2)

# Initial counts before filtering
trump_initial_count = trump_df.shape[0]
biden_initial_count = biden_df.shape[0]

# Filtering DataFrames to include only English tweets
trump_df = trump_df[trump_df.lang == 'en']
biden_df = biden_df[biden_df.lang == 'en']

# Print the retained percentage
print(f'We retained {np.around(trump_df.shape[0] * 100 / trump_initial_count, decimals=1)}% of the initial Trump dataset')
print(f'And {np.around(biden_df.shape[0] * 100 / biden_initial_count, decimals=1)}% from Biden')


## Analys the data.

In [None]:
trump_df['ds'] = 'trump'
biden_df['ds'] = 'biden'

# Combine the filtered on United States Trump and Biden Datasets
tweets_df = pd.concat([biden_df, trump_df],ignore_index=True)


In [None]:
tweets_df

In [None]:
start_time = time.time()

plt.figure(figsize=(15,5))

tweets_df.created_at.dt.date.value_counts().sort_index().plot.bar(rot=90, alpha=0.3,color='green')

plt.setp(ax.get_xticklabels(), fontsize=16)
plt.title('Frequency of tweets per day')
plt.show()

stop_time = time.time()
print(f'It took {(time.time() - start_time)} seconds')

In [None]:
most_popular_tweet = tweets_df.loc[tweets_df['retweet_count'].idxmax()]
print(f" The tweet:\n'{most_popular_tweet.tweet}'\nwas retweeted the most ({most_popular_tweet.retweet_count} times).")

#### So, our most popular tweet contains irony. I'm curious to see how sentiment analysis libraries will perform on this.

In [None]:
tweets_df[['tweet_id','user_id','created_at', 'likes', 'retweet_count', 'tweet', 'ds']].iloc[tweets_df.retweet_count.sort_values(ascending=False).head(5).index]

In [None]:
print(f'Our 2nd most popular tweet was retweeted for number of times equal to {np.around(13500*100/tweets_df.shape[0], decimals=1)}% of our dataset size')

In [None]:
new_var = tweets_df[tweets_df.tweet.str.contains('Are you there')][['created_at', 'tweet', 'user_id']]
new_var

In [None]:
# tweets_df.columns


In [None]:
print(f'There are {tweets_df.retweet_count.nunique()} different amounts of retweets')


#### In the kdeplot below we explore the retweeting behavior. It looks like a huge amount of tweets are never retweeted. And then we have a tiny number of tweets that get retweeted all the way up to ~ 17.500 times. This was to be expected

In [None]:
sns.kdeplot(x='retweet_count', data=tweets_df)

#### "Vocal minority" and "silent majority" effect¶
Mustafaraj et al. 2011 [1] showed evidence of the existance on social media of a minority of users which are very vocal, while there is a majority of users which hardly produce content.

We explore this phenomenon in our dataset by looking at the distribution of the number of tweets per user.
The frequency distribution we obtain confirms that there are a small number of users producing a large portion of the tweets for both candidates (the trend is stronger for Biden).
This indicates that:

drawing conclusions about which candidate is preferred based on the number of tweets would be influenced strongly by this small number of very active users.
In the section dedicated to 'predicting' election results from Tweets, we will see how we can enforce a policy of 'one vote per person' when analysing tweets.

In [None]:
fig, ax=plt.subplots(1,1, figsize=(12,6))

ax.set_title('Frequency distribution of number of tweets per user', fontsize = 16)
sns.kdeplot(trump_df.groupby(['user_id'])['tweet'].count(), shade=True, color='r', label='Trump', ax = ax)
sns.kdeplot(biden_df.groupby(['user_id'])['tweet'].count(), shade=True, color='b', label='Biden', ax = ax)
labels= ["Trump", "Biden"]
ax.legend(labels)
#ax.set_ylim(0, .005)
plt.show()


## Text pre-processing

In [None]:
tweets_df.tweet.head(5)

In [None]:
tweets_df.shape

#### First, we clean our data:

we convert everything to lowercase

we remove punctuation, links, @mentions and # hashtags

we remove stop words - stop words are a set of commonly used words in any language.
For example, in English, “the”, “is” and “and”. These don't add any meaningful information for our analysis
lemmatization - reduces inflected words to the root of that word (e.g. 'pursuing' becomes 'pursue')

tokenization - split each tweet into a list of individual words

In [None]:
import re
import nltk
from nltk.corpus import stopwords, wordnet
from nltk.stem import WordNetLemmatizer
import spacy
from gensim.parsing.preprocessing import remove_stopwords

# Ensure necessary NLTK data is downloaded
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('stopwords')

def get_wordnet_pos(word):
    """Map POS tag to first character lemmatize() accepts"""
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}

    return tag_dict.get(tag, wordnet.NOUN)

def clean_text(tweet, lemmatize='nltk'):
    """
    Inputs:
    tweet - a string representing the text we need to clean
    lemmatize - one of two possible values {spacy, nltk}
      two lemmatization methods
      with our dataset, we got the best results with nltk
      but Spacy also did a good job, hence you might
      try both and compare results for your own data

    Output:
    tokenized - the cleaned text, tokenized (a list of string words)
    """
    tweet = tweet.lower() # lowercase
    tweet = re.sub(r"http\S+|www\S+|https\S+", '', tweet, flags=re.MULTILINE) # remove urls
    tweet = re.sub(r'\@\w+|\#', '', tweet) # remove mentions of other usernames and the hashtag character
    tweet = remove_stopwords(tweet) # remove stopwords with Gensim

    if lemmatize == 'spacy':
        # Initialize spacy 'en_core_web_sm' model, keeping only tagger component needed for lemmatization
        nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner'])
        doc = nlp(tweet)
        tokenized = [token.lemma_ for token in doc if token.lemma_ != '-PRON-']
    elif lemmatize == 'nltk':
        '''
        lemmatization works best when WordNetLemmatizer receives both the text and the part of speech of each word
        the code below assigns POS (part of speech) tag on a per word basis (it does not infer POS from content / sentence), which might not be optimal
        '''
        lemmatizer = WordNetLemmatizer()
        tokenized = [lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(tweet)]

    # remove left over stop words with nltk
    tokenized = [token for token in tokenized if token not in stopwords.words("english")]

    # remove non-alpha characters and keep the words of length >2 only
    tokenized = [token for token in tokenized if token.isalpha() and len(token) > 2]

    return tokenized

def combine_tokens(tokenized):
    non_tokenized = ' '.join([w for w in tokenized])
    return non_tokenized


In [None]:
# Cleaning our tweets.

start =  time.time()

tweets_df['tokenized_tweet_nltk'] = tweets_df['tweet'].apply(lambda x: clean_text(x, 'nltk'))
tweets_df['clean_tweet_nltk'] = tweets_df['tokenized_tweet_nltk'].apply(lambda x: combine_tokens(x))

stop = time.time()
print(f'Cleaning all tweets takes ~{round((stop-start)/60, 3)} minutes: ')

In [None]:
# save this clean data file
tweets_df.to_csv('clean_tweets_df.csv', index=False)

In [None]:
tweets_df[tweets_df.tweet_id.isin(list(trump_df.tweet_id))].ds = 'trump'
tweets_df[tweets_df.tweet_id.isin(list(biden_df.tweet_id))].ds = 'biden'

In [None]:
tweets_df.head(10)[['tweet', 'clean_tweet_nltk']]

# sentiment analysis

### The most popular algorithms are:

### Rule-based models
For example, TextBlob and Vader They use a bag-of-words approach: the text is considered to be the sum of its constituent words,

### Word-embedding-based models:
Words are represented as vectors of numbers in an n-dimensional space This mapping from individual words to a continuous vector space can be generated through various methods: neural networks, dimensionality reduction, co-occurence matrix.

* For this analysis of tweets I tried three of the currently most popular sentiment analysis libraries.
TextBlob and Vader use rule-based models, while Flair uses word embeddings.

* All three output a continuous number between -1 and 1.
If one needs a classification into categories instead of these numerical values, the common interpretation is that <0 is negative, 0 is neutral and >0 is positive. The cutoff points for the three categories are not set in stone and can be adapted based on the results / visual inspection.

### Popular libraries for sentiment analysis¶
For this analysis of tweets I tried three of the currently most popular sentiment analysis libraries.
* TextBlob use rule-based models.
* Vader use rule-based models.
* while Flair uses word embeddings.

All three output a continuous number between -1 and 1.
If one needs a classification into categories instead of these numerical values, the common interpretation is that <0 is negative, 0 is neutral and >0 is positive. The cutoff points for the three categories are not set in stone and can be adapted based on the results / visual inspection.

## differences between each:

**TextBlob** is the simplest of them It does estimate though how factual versus opinionated a text is

**Vader** The valence for the words in the dictionary was empirically validated by multiple human judges “especially attuned to microblog-like contexts”
Uses some heuristics to recognize word negations (“cool” versus “not cool”) and word intensifiers (“a bit sad” versus “really sad”)
Cannot recognize typos and will consider them out of vocabulary words (veri relevant for twitter, where users tend to not spell correctly)

**Flair** is a pre-trained character-level LSTM (recurrent neural networks) classifier which takes into account:
the sequence of words
the sequence of letters -> recognizes typos
intensifiers ('so', 'very', ‘a bit’ etc)
Flair is trained on IMDB movie reviews dataset and retraining is resource intensive.
Very polarizing (assigns very positive or very negative scores), but not much in the middle

### 1> Sentiment analysis with TextBlob¶
According to TextBlob's official website, TextBlob "provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more."

TextBlob library will output something like this for each snippet of text that it analyzes:
Sentiment(polarity=-0.125, subjectivity=0.5916666666666667)

That is, TextBlob will output:

a measure of polarity, which can have values in the interval [-1, 1]
an estimation of subjectivity, ranging is [0.0, 1.0] where 0.0 is very objective (dealing with facts) and 1.0 is very subjective (opinions).

We will adopt the approach like:-

we will label
* <0 values as 'negative'
* 0 values as 'neutral'
* and >0 values as 'positive'

In [None]:
# Helper Function to assign Label for Sentiment Analysis with TextBlob
def create_sentiment_labels(df, feature,value):
    '''
    in:
        dataframe
        value on which to classify
        feature - column name of the feature that receives the label
    out:
        does not return a value
        modifies the dataframe received as parameter
    '''

    df.loc[df[value] > 0,feature] = 'positive'
    df.loc[df[value] == 0,feature] = 'neutral'
    df.loc[df[value] < 0,feature] = 'negative'

In [None]:
# Polarity and subjectivity
def sentiment_analysis(dataframe):
    dataframe['blob_polarity'] = dataframe['clean_tweet_nltk'].apply(lambda x: TextBlob(x).sentiment.polarity)
    dataframe['blob_subjectivity'] = dataframe['clean_tweet_nltk'].apply(lambda x: TextBlob(x).sentiment.subjectivity)

    create_sentiment_labels(dataframe, 'blob_sentiment','blob_polarity')

    return dataframe[['clean_tweet_nltk','blob_polarity','blob_subjectivity','blob_sentiment']].head()

In [None]:
import time
start =  time.time()

sentiment_analysis(tweets_df)

stop = time.time()
print(f'Sentiment analysis with TextBlob took: {round((stop-start)/60, 3)} minutes')

In [None]:
tweets_df.head(10)[['blob_polarity','blob_subjectivity', 'blob_sentiment']]

### Different Ways to Look at the Same Data

#### Option A: Average Polarity Per Candidate

To analyze sentiment, we can compute the average polarity (sentiment score) for each candidate by averaging the polarity of all tweets about them using TextBlob.

#### Issues with This Approach

Consider this scenario:
- One user tweets 99 times with a polarity of -1 (negative sentiment).
- Another user tweets once with a polarity of 1 (positive sentiment).

If we average the polarity across all tweets, we get a result of -0.98.

#### Problems with the Result
- This result suggests strong opposition to the candidate.
- However, it doesn't accurately reflect that we have one supporter and one opposer in our sample.
- The average only tells us the overall sentiment across tweets, not the true distribution of support and opposition.

In summary, averaging polarity might mislead us about the actual sentiment distribution among users.

In [None]:
#update the divided dataset
trump_df = tweets_df[tweets_df.ds=='trump']
biden_df = tweets_df[tweets_df.ds=='biden']

fig, axes = plt.subplots(1, 2, figsize=(8,5))

fig.suptitle('TextBlob analysis: nmean polarity (-1.0, 1.0) and mean subjectivity (0.0, 1.0) per candidate (one tweet, one sentiment)', fontsize=14)

features = ['blob_polarity', 'blob_subjectivity']
values = [trump_df.groupby(['user_id'])['blob_polarity'].mean().mean(), trump_df.groupby(['user_id'])['blob_subjectivity'].mean().mean()]
axes[0].bar(features,values, width=0.2)
axes[0].set_ylim(0, .5)
axes[0].set_title('Trump', fontsize = 14)
axes[0].set_ylabel('Value', fontsize = 12)

values = [biden_df.groupby(['user_id'])['blob_polarity'].mean().mean(), biden_df.groupby(['user_id'])['blob_subjectivity'].mean().mean()]
axes[1].bar(features,values, width=0.2)
axes[1].set_ylim(0, .5)
axes[1].set_title('Biden', fontsize = 14)
axes[1].set_ylabel('Value', fontsize = 12)

fig.tight_layout(rect=[0, 0.03, 1, 0.88])
plt.show()

trump_usa_pol_tweet =trump_df['blob_polarity'].mean()
trump_usa_subj_tweet = trump_df['blob_subjectivity'].mean()
biden_usa_pol_tweet = biden_df['blob_polarity'].mean()
biden_usa_subj_tweet = biden_df['blob_subjectivity'].mean()

#### Option b
Another option is to:

first average sentiment expressed through tweets per user id -> we will have one averge expressed sentiment per user per candidate
then average across the whole population for each candidate

In [None]:
# the below gives us a mean per user
# trump_usa_df[['user_id', 'Polarity']].groupby(['user_id'])['Polarity'].mean()

fig, axes = plt.subplots(1, 2, figsize=(8, 5))

fig.suptitle('TextBlob analysis: mean polarity (-1.0, 1.0) and mean subjectivity (0.0, 1.0)\nper candidate (one user, one sentiment)', fontsize=14)

features = ['blob_polarity', 'blob_subjectivity']
values = [trump_df.groupby(['user_id'])['blob_polarity'].mean().mean(), trump_df.groupby(['user_id'])['blob_subjectivity'].mean().mean()]
axes[0].bar(features,values, width=0.2,)
axes[0].set_ylim(0, .5)
axes[0].set_title('Trump', fontsize = 14)
axes[0].set_ylabel('Value', fontsize = 12)

values = [biden_df.groupby(['user_id'])['blob_polarity'].mean().mean(), biden_df.groupby(['user_id'])['blob_subjectivity'].mean().mean()]
axes[1].bar(features,values, width=0.2,)
axes[1].set_ylim(0, .5)
axes[1].set_title('Biden', fontsize = 14)
axes[1].set_ylabel('Value', fontsize = 12)

fig.tight_layout(rect=[0, 0.03, 1, 0.88])
plt.show()

trump_usa_pol_user = trump_df.groupby(['user_id'])['blob_polarity'].mean().mean()
trump_usa_subj_user = trump_df.groupby(['user_id'])['blob_subjectivity'].mean().mean()
biden_usa_pol_user = biden_df.groupby(['user_id'])['blob_polarity'].mean().mean()
biden_usa_subj_user = biden_df.groupby(['user_id'])['blob_subjectivity'].mean().mean()

In [None]:
#how our results are influenced by choosing either of the two options mentioned above

fig, axes = plt.subplots(1, 2, figsize=(10,6))

fig.suptitle('TextBlob analysis: \nmean polarity and mean subjectivity\n (tweet level = one tweet, one sentiment) vs (user level = one user, one sentiment)', fontsize=16)

#features = ['Polarity', 'Subjectivity']
features = np.array([1, 2])
values_tweet = [ trump_usa_pol_tweet, trump_usa_subj_tweet]
values_user = [ trump_usa_pol_user, trump_usa_subj_user]

#values = [[trump_usa_pol_tweet, trump_usa_subj_tweet],
#[trump_usa_pol_user, trump_usa_subj_user]]

axes[0].bar(features-0.2, values_tweet, width=0.2, align = 'center', color = 'y')
axes[0].bar(features, values_user, width=0.2, align = 'center', color = 'g')
#axes[0].bar(features,values)
axes[0].set_ylim(0, .5)
axes[0].set_title('Trump', fontsize = 16)
axes[0].set_xlabel('Feature', fontsize = 14)
axes[0].set_ylabel('Average value', fontsize = 14)
axes[0].set_xticklabels(['', 'Polarity', '', '', '', '', '', 'Subjectivity'])
labels= ["tweet level", "user level"]
axes[0].legend(labels)

values_tweet = [ biden_usa_pol_tweet, biden_usa_subj_tweet]
values_user = [ biden_usa_pol_user, biden_usa_subj_user]
axes[1].bar(features-0.2,values_tweet, width=0.2, align = 'center', color = 'y')
axes[1].bar(features,values_user, width=0.2, align = 'center', color = 'g')
axes[1].set_ylim(0, .5)
axes[1].set_title('Biden', fontsize = 16)
axes[1].set_xlabel('Feature', fontsize = 14)
axes[1].set_ylabel('Average value', fontsize = 14)
axes[1].set_xticklabels(['', 'Polarity', '', '', '', '', '', 'Subjectivity'])

labels= ["tweet level", "user level"]
axes[1].legend(labels)

fig.tight_layout(rect=[0, 0.03, 1, 0.88])
plt.show()

#### It doesn't really matter how we average the sentiment. In any case, "one user one sentiment" makes much more sense for our analysis. But it's easier to code the "one tweet one sentiment", so we'll use this one, since it has the same result.

In [None]:
plt.figure(figsize=(6,5))

ax = plt.gca()
ax.set_title('--Relative--\nTextBlob sentiment analysis - \nrelative frequency per valence type for each candidate', fontsize=16)

features = np.array([1,2,3])
trump = (trump_df['blob_sentiment'].sort_values().value_counts()/trump_df['blob_sentiment'].shape[0])[['negative', 'neutral', 'positive']]
ax.bar(features-0.3, trump.values, width=0.3, align = 'center', color = 'r', alpha= .6)

biden = (biden_df['blob_sentiment'].sort_values().value_counts()/biden_df['blob_sentiment'].shape[0])[['negative', 'neutral', 'positive']]
ax.bar(features, biden.values, width=0.3, align = 'center', color = 'b', alpha= .6)

ax.set_ylim(0, .5)
ax.set_xlabel('Valence', fontsize = 14)
ax.set_ylabel('Relative frequency', fontsize = 14)

ax.set_xticklabels(['', '', 'Negative', '', 'Neutral', '', 'Positive'])

labels= ["Trump", "Biden"]
ax.legend(labels)

fig.tight_layout(rect=[0, 0.03, 1, 0.88])
plt.show()

### Observations from the above plot:

the ratio of positive:negative is higher for Biden than for Trump.

When people tweet about Biden, they tend to be less negative than when they tweet about Trump.

In [None]:
plt.figure(figsize=(6,5))

ax = plt.gca()
ax.set_title('--Absolute--\nTextBlob sentiment analysis - \nabsolute frequency per valence type for each candidate', fontsize=16)

features = np.array([1,2,3])
trump = (trump_df['blob_sentiment'].sort_values().value_counts())[['negative', 'neutral', 'positive']]
ax.bar(features-0.3, trump.values, width=0.3, align = 'center', color = 'r', alpha=0.6)

biden = (biden_df['blob_sentiment'].sort_values().value_counts())[['negative', 'neutral', 'positive']]
ax.bar(features, biden.values, width=0.3, align = 'center', color = 'b', alpha = 0.6)

#ax.set_ylim(0, .5)
ax.set_xlabel('Valence', fontsize = 14)
ax.set_ylabel('Absolute frequency', fontsize = 14)

ax.set_xticklabels(['', '', 'Negative', '', 'Neutral', '', 'Positive'])

labels= ["Trump", "Biden"]
ax.legend(labels)

fig.tight_layout(rect=[0, 0.03, 1, 0.88])
plt.show()

 The absolute frequency plot is relevant because all those negative tweets could potentially be support votes for the other candidate, since in presidential elections people only have 2 options. If they hate one candidate, that could be enough reason to vote for the other one.


## 2> Sentiment analysis with VADER¶
VADER (Valence Aware Dictionary for Sentiment Reasoning) was developed in 2014.
You can check Vader's official github for details of how the tool was designed and how to use it.

According to VADER's github, VADER is "Empirically validated by multiple independent human judges, VADER incorporates a "gold-standard" sentiment lexicon that is especially attuned to microblog-like contexts."

* Vader is a pre-trained model. If you want to read about the model in detail, the official website recommends [2]

* Vader outputs something like this:
{'neg': 0.0, 'neu': 0.436, 'pos': 0.564, 'compound': 0.3802}

Negative, neutral and positive are scores between 0 and 1.
The compound value reflects the overall sentiment of the text. It's computed based on the values of negative, neutral and positive. It ranges from -1 (maximum negativity) to 1 (maximum positivity).

The is no standard way to interpret compound. One can decide that whatever is larger than 0 is positive and lower is negative, while 0 means neutral.
But we can also decide to look only at more extreme values, like above or below +/- 0.8, for example.
It really depends on the kind of data you have.

In [None]:
sid = SentimentIntensityAnalyzer()

In [None]:
def sentiment_analysis_vader(df, clean = True):
    if clean:
        target_col = 'clean_tweet_nltk'
        prefix = 'vader_clean_'
    else:
        target_col = 'tweet'
        prefix = 'vader_'

    scores_col = prefix+'scores'

    #let's make it vader_sentiment, so that it has the same naming convention as TextBlob and Flair sentiment score
    #compound_col = prefix+'compound'
    compound_col = prefix+'polarity'

    #comp_score_col = prefix+'comp_score'
    comp_score_col = prefix+'sentiment'

    df[scores_col] = df[target_col].apply(lambda tweet: sid.polarity_scores(tweet))
    df[compound_col] = df[scores_col].apply(lambda d: d['compound'])
    create_sentiment_labels(df,comp_score_col,compound_col)

In [None]:
start = time.time()

sentiment_analysis_vader(tweets_df)
sentiment_analysis_vader(tweets_df, clean = False)

stop = time.time()
print(f'Sentiment analysis with VADER took: {round((stop-start)/60, 3)} minutes')

#update the divided dataset
trump_df = tweets_df[tweets_df.ds=='trump']
biden_df = tweets_df[tweets_df.ds=='biden']

Does it matter if we clean the tweets before feeding them to Vader ? Does Vader itself perform a good enough cleaning ? We answer this question by classifying tweets into positive / neutral / negative using both approaches and then looking at the accuracy_score for the labels obtained through the two methods.

In [None]:
from sklearn.metrics import accuracy_score

start = time.time()

print(f"Accuracy score for our cleaning vs vader tweet cleaning for Trump: {accuracy_score(trump_df['vader_sentiment'],trump_df['vader_clean_sentiment']):.4}")
print(f"Accuracy score for our cleaning vs vader tweet cleaning for Biden: {accuracy_score(biden_df['vader_sentiment'],biden_df['vader_clean_sentiment']):.4}")

stop = time.time()
print(f'This took: {round((stop-start)/60, 3)} minutes')

It looks like there is 84% consensus for Trump and 88% consensus for Biden for sentiment per tweet when VADER is fed the raw data versus the cleaned data.
So that means the decision to feed raw or cleaned data should be given some thought.
Since we don't have labelled data, the only way to decide which method is best is by visual inspection.

In [None]:
def get_valence_relative_freq(df):
    #grouped = df.sort_values('comp_score').groupby(['comp_score'], sort=False)
    grouped = df.sort_values('vader_sentiment').groupby(['vader_sentiment'], sort=False)
    valence = grouped['vader_sentiment'].value_counts(normalize=False, sort=False)
    valence = valence.droplevel(0)
    valence = valence / valence.sum()
    return valence

In [None]:
import seaborn as sns
sns.set_theme(style="darkgrid")

trump_tmp = get_valence_relative_freq(trump_df)
biden_tmp = get_valence_relative_freq(biden_df)

#plt.figure(figsize=(8,6))
fig, axes = plt.subplots(1, 2, figsize=(8,5))
fig.suptitle('Vader sentiment analysis - \nrelative frequency per valence type for each candidate', fontsize=16)
#fig.tight_layout()

#sns.barplot(trump_tmp.index, trump_tmp.values, ax=axes[0])
(trump_tmp).plot(kind='bar', ax = axes[0])
axes[0].set_title('Trump', fontsize = 16)
axes[0].set_xlabel('Valence', fontsize = 14)
axes[0].set_ylabel('Relative frequency', fontsize = 14)
axes[0].set_ylim(0, .5)

#ax2 = sns.countplot(x="comp_score", data=biden_tmp)
#sns.barplot(biden_tmp.index, biden_tmp.values,  ax=axes[1])
(biden_tmp).plot(kind='bar', ax = axes[1])
axes[1].set_title('Biden', fontsize = 16)
axes[1].set_xlabel('Valence', fontsize = 14)
axes[1].set_ylabel('Relative frequency', fontsize = 14)
axes[1].set_ylim(0, .5)

plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(6,5))
sns.set_style("white")

ax = plt.gca()
ax.set_title('--Relative--\nTextBlob sentiment analysis - \nrelative frequency per valence type for each candidate', fontsize=16)

features = np.array([1,2,3])

trump = get_valence_relative_freq(trump_df)
ax.bar(features-0.3, trump.values, width=0.3, align = 'center', color = 'r', alpha= .6)

biden = get_valence_relative_freq(biden_df)
ax.bar(features, biden.values, width=0.3, align = 'center', color = 'b', alpha= .6)

ax.set_ylim(0, .5)
ax.set_xlabel('Valence', fontsize = 14)
ax.set_ylabel('Relative frequency', fontsize = 14)

ax.set_xticklabels(['', '', 'Negative', '', 'Neutral', '', 'Positive'])

labels= ["Trump", "Biden"]
ax.legend(labels)

fig.tight_layout(rect=[0, 0.03, 1, 0.88])
plt.show()

### Some observations we can make based on plot above:¶
within-candidates:

Trump has a ratio of 1:1 for positive to negative tweets, while for Biden, it's almost 2:1

between-candidates:
higher % of positive tweets for Biden
higher % of negative tweets for Trump

### Average sentiment score per candidate¶

In [None]:
print(trump_df['vader_polarity'].mean())
print(biden_df['vader_polarity'].mean())

fig = plt.figure(figsize=(5,5))

fig.suptitle('Mean VADER compund score (between -1.0 and 1.0)\nfor Trump and Biden', fontsize=16)

features = ['Mean Valence Trump', 'Mean Valence Biden']
values = [trump_df['vader_polarity'].mean(), biden_df['vader_polarity'].mean()]

plt.bar(features,values, width=0.2)

axes = plt.gca()
axes.set_ylim(-.3, .3)
axes.set_xlabel('Feature', fontsize = 14)
axes.set_ylabel('Value', fontsize = 14)

plt.show()

In [None]:
# Let's explore further the differences between sentiment for the two candidates.
# We continue with visual inspection of the distribution of sentiment scores.

bins = 50

fig = plt.figure(figsize=(8,5))
fig.suptitle('Histograms of tweets polarity per candidate (VADER)', fontsize=16)

plt.hist(trump_df['vader_polarity'], bins = bins, alpha = 0.5, color = 'r')
plt.hist(biden_df['vader_polarity'], bins = bins, alpha = 0.5, color = 'b')

axes = plt.gca()
axes.set_ylim(0, 4000)

labels= ["Trump", "Biden"]
axes.legend(labels)

fig.tight_layout(rect=[0, 0.03, 1, 0.88])
plt.show()

## 3> Sentiment analysis with Flair¶
Flair is a pre-trained character-level LSTM (recurrent neural networks) classifier which takes into account:

* the sequence of words
* the sequence of letters
* intensifiers ('so', 'very' etc) Advantage over VADER: by looking at character level, it can recognize and correct for typos (e.g. it will recognize that 'anoy' means 'annoy'), which for VADER would just be an OOV (Out Of Vocabulary) word (and thus ignored).

##### Advantage over VADER: by looking at character level, it can recognize and correct for typos (e.g. it will recognize that 'anoy' means 'annoy'), which for VADER would just be an OOV (Out Of Vocabulary) word (and thus ignored).

Pre-trained Flair models
As we lack computing power, we will use a freely available pre-trained Flair model.

In [None]:
classifier = TextClassifier.load('en-sentiment')

Remember when we removed from our dataset tweets in other languges than English ? If we had kept them, our 'en' classifier declared above wouldn't have been able to interpret them anyway.

Here's an example of how to use Flair classifier to predict for one sentence. Flair clasifier outputs the assigned label and a value between 0 and 1 indicating the confidence level for this prediction.

In [None]:
sentence = Sentence('The food was not horrible!')
classifier.predict(sentence)

print('Sentence above is: ', sentence.labels)

In [None]:
# Helper functions for performing the sentiment analysis using Flair

def flair_make_sentences(text):
    """ Break apart text into a list of sentences """
    sentences = [sent for sent in split_single(text)]
    return sentences

def flair_predict_sentences(sentence):
    """ Predict the sentiment of a sentence """
    if sentence == "":
        return 0
    text = Sentence(sentence)
    # stacked_embeddings.embed(text)
    classifier.predict(text)
    value = text.labels[0].to_dict()['value']
    if value == 'POSITIVE':
        result = text.to_dict()['labels'][0]['confidence']
    else:
        result = -(text.to_dict()['labels'][0]['confidence'])
    return round(result, 3)

def flair_get_scores_per_sentences(sentences):
    """ Call predict on every sentence of a text """
    results = []

    for i in range(0, len(sentences)):
        results.append(flair_predict_sentences(sentences[i]))
    results.append(flair_predict_sentences(sentences[0]))
    return results

def flair_get_sum(scores):
    result = round(sum(scores), 3)
    return result

def flair_get_avg_from_sentences(scores):
    result = round(np.mean(scores), 3)
    return result

def flair_get_score_tweet(text):
  if not text:
    return 0
  s = Sentence(text)
  classifier.predict(s)
  value = s.labels[0].to_dict()['value']
  if value == 'POSITIVE':
    result = s.to_dict()['labels'][0]['confidence']
  else:
    result = -(s.to_dict()['labels'][0]['confidence'])
  return round(result, 3)

def sentiment_analysis_flair(polarity):
  if polarity > 0:
    return 'positive'
  if polarity == 0:
    return 'neutral'
  if polarity < 0:
    return 'positive'

## Let's explore the Flair results in the context of a comparison between all three methods (TextBlob, VADER and Flair).

### Which is the best sentiment analysis library.
* TextBlob
* VADER
* Flair per sentence
* Flair per tweet

### Flair: predict sentiment per sentence versus sentiment per tweet¶

But first we will explore and compare sentiment labelling for two ways to use Flair and here is why.
We asked oursleved what is the best way to perform prediction for Tweet ?

* option 1: Should we predict on the whole Tweet ?
* option 2: Should we split into sentences, predict for each sentence and then make an average ?


In [None]:
records = 1000
temp = tweets_df[tweets_df.ds=='trump'][:records].copy()

import time
start = time.time()

#flair sentiment by diving tweet into sentenes and averaging
temp['sentences'] = temp['clean_tweet_nltk'].apply(flair_make_sentences)
temp['scores'] = temp['sentences'].apply(flair_get_scores_per_sentences)
temp['flair_scores_avg'] = temp.scores.apply(flair_get_avg_from_sentences)

#flair sentiment on the whole tweet
temp['flair_one_score'] = temp['clean_tweet_nltk'].apply(flair_get_score_tweet)

stop = time.time()
print(round((stop-start)/60, 3))

In [None]:
bins = 50
alpha = 0.6
fig = plt.plot(figsize=(6,5))

plt.title('Flair polarity: per sentence versus per tweet')
ax = plt.gca()

ax.hist(temp['flair_scores_avg'], bins = bins, alpha = alpha, color = 'r')
ax.hist(temp['flair_one_score'], bins = bins, alpha = alpha, color = 'g')

ax.set_ylim(0, 100)
labels= ["Flair sentences", "Flair tweet"]
ax.legend(labels)
plt.show()

#### The above plot shows that the two methods for computing polarity of a tweet with Flair produce exactly the same results. It might sound silly, but worth trying out because this level of details are not easily accessible from the documentation.


### Comparison of the three sentiment analysis libraries¶
We only perform this analysis on the 1000 data points because Flair is very resource intensive. But try it out on the whole dataset if you have more computing power.

In [None]:
bins = 50
alpha = 0.6
fig = plt.plot(figsize=(8,7))

plt.title('Distribution of sentiment scores\nTextBlob vs VADER vs Flair', fontsize=16)

ax = plt.gca()

ax.hist(temp['blob_polarity'], bins = bins, alpha = alpha, color = 'r')
ax.hist(temp['vader_polarity'], bins = bins, alpha = alpha, color = '#ffd343')
ax.hist(temp['flair_one_score'], bins = bins, alpha = alpha, color = 'g')
ax.set_ylim(0, 100)
labels= ["TextBlob", "VADER", "Flair"]
ax.legend(labels)

ax.set_ylabel('Frequency', fontsize = 14)

#fig.tight_layout(rect=[0, 0.03, 1, 0.9])
plt.show()

## Observations from the distribution plot of sentiment produced by TextBlob, VADER and Flair
We notice that TextBlob and VADER tend to:

classify a lot of the data as neutral
VADER has a bimodal distribution, while TextBlob is unimodal.
Flair:

has no predilection for neutral.
and it's extremly polarizing, compared to TextBlob or VADER
Regarding the lack of a strong neutral category in Flair (compare it to TextBlob and VADER for example), Flair co-creator, Alan Akbik, explains that when Flair sentiment analysis model was trained on reviews dataset and there was too much variability in people's attitudes in the middle, which prevented the model from learning something useful for a rating that translates to 'average'. Some people would give an average rating if the product/service had a few shortcomings, while others would punish with an average rating if it was a complete disapointment.

According to Alan Akbik, they ended up training only on more extreme reviews, to avoid the middle reviews with very low signal to noise ration.
That was the best approach for movie reviews. But we don't know if it's the best for our data too, for tweets.

Agreement between TextBlob, VADER and Flair predictions
We know the three differ in how extreme their predicted sentiment value is. But do they agree on the direction of that sentiment, regardless of whether they agree on intensity. That is, if TexBlob predicts a weak sentiment and Flair a strong sentiment, are they both negative or both positive ? Or one predicts a -.2 (weak negative) and the other a .9 (strong positive) ?

To answer the question above, tet's see the percentage of times these algorithms agree with one another when classifying the sentiment of a tweet.

In [None]:
temp['flair_sentiment'] = temp['flair_one_score'].apply(sentiment_analysis_flair)

print(f"Consensus TextBlob - VADER: {accuracy_score(temp['blob_sentiment'],temp['vader_sentiment']):.4}")
print(f"Consensus TextBlob - Flair: {accuracy_score(temp['blob_sentiment'],temp['flair_sentiment']):.4}")
print(f"Consensus VADER - Flair: {accuracy_score(temp['vader_sentiment'],temp['flair_sentiment']):.4}")


In [None]:
fig = plt.figure(figsize=(6,6))

ax = plt.gca()
ax.set_title('Agreement between TextBlob, VADER and Flair predictions', fontsize=16)

features = np.array([1,2,3])
values = [accuracy_score(temp['blob_sentiment'],temp['vader_sentiment']), accuracy_score(temp['blob_sentiment'],temp['flair_sentiment']), accuracy_score(temp['vader_sentiment'],temp['flair_sentiment'])]

ax.bar(features, values, width=0.3, align = 'center', color = 'g', alpha= .6)

ax.set_ylim(0, .6)
#ax.set_xlabel('Valence', fontsize = 14)
ax.set_ylabel('% of agreement', fontsize = 14)

ax.set_xticklabels(['', 'Consensus TextBlob\n-VADER', '', 'Consensus TextBlob\n-Flair', '', 'Consensus VADER\n- Flair'])

#labels= ["Trump", "Biden"]
#ax.legend(labels)

fig.tight_layout(rect=[0, 0.03, 1, 0.80])
plt.show()

The results above indicate two possibilities:

- either all algorithms are wrong a lot of times
- or just two of them are mostly wrong and one does a good job.

Unfortunately, I know of no other way to automatically compare performance, so we will visualize a few examples where they all disagree and try to eyeball who's the right one.

In [None]:
def consensus(row):
    count = 0
    count += row['blob_sentiment']==row['vader_sentiment']
    count += row['blob_sentiment']==row['flair_sentiment']
    count += row['vader_sentiment']==row['flair_sentiment']

    return count

temp['consensus'] = temp.apply(lambda row: consensus(row), axis=1)

print(temp['consensus'].value_counts())

In [None]:
# Change the number of displayd results in the code below if you want to visually inspect more of them yourself
(temp[temp['consensus']==0])[['tweet', 'clean_tweet_nltk', 'blob_sentiment', 'vader_sentiment', 'flair_sentiment']].tail(5)

## Based on the distribution plots and after inspecting a few tweets where all three algorithms applied different labels, I decided to use in further analyses the results from VADER

In [None]:
## Actionable insights from sentiment analysis of tweets¶
There are several ways to use sentiment analysis on tweets related to a political campaign.

One is to try to predict the election results from the sentiments towards the candidated.