# Capstone part 2 - Sentiment Analysis 
Author - Udita Bajaj


In this section, I conduct sentiment analysis. I use the Vader sentiment analyser and then I create a Naive Bayes classifier as a critique of the vader sentiment analyser. 

### Vader Sentiment Intensity Anlayser 
The Vader sentiment analyser assigns a sentiment score to the text after it sums up the intensity of each word present in that text. It computes four scores - negative, neutral, positive and compound.The compound score is computed by normalizing the other three scores. 
I assign the compound score to a new column 'Polarity score', and I create columns for the rest of the scores as they are - 'Negative score', 'Positive score', and 'Neutral score'. The resulting sentiment depends on the value of the polarity score - if a tweet has a Polarity score of less than 0, it is classified as Negative. If a tweet has a Polarity score of more than 0, it is classified as Positive. If a tweet has a Polarity score of 0, it is classified as Neutral. The percentages of classification of tweets is represntated in a pie chart as calculated by the vader sentiment analyser. 

In [80]:
# install vader sentiment
!pip install vaderSentiment



In [81]:
#import sentiment intensity analyser from vader
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [82]:
#initialise the Sentiment Intensity analyser, used on the 'tweet_without_stopwords' column
SIA = SentimentIntensityAnalyzer()
newdf["tweet_without_stopwords"]= newdf["tweet_without_stopwords"].astype(str)
#Creating all columns of scores by applying the Sentiment Analyser in a lambda function
newdf['Polarity Score']=newdf["tweet_without_stopwords"].apply(lambda x:SIA.polarity_scores(x)['compound'])


In [83]:
newdf['Neutral Score']=newdf["tweet_without_stopwords"].apply(lambda x:SIA.polarity_scores(x)['neu'])
newdf['Negative Score']=newdf["tweet_without_stopwords"].apply(lambda x:SIA.polarity_scores(x)['neg'])
newdf['Positive Score']=newdf["tweet_without_stopwords"].apply(lambda x:SIA.polarity_scores(x)['pos'])
# Converting 0 to 1 Decimal Score to a Categorical Variable (sentiments) - Positive, Neutral, and Negative
newdf['Sentiment']=''
newdf.loc[newdf['Polarity Score']>0,'Sentiment']='Positive'
newdf.loc[newdf['Polarity Score']==0,'Sentiment']='Neutral'
newdf.loc[newdf['Polarity Score']<0,'Sentiment']='Negative'
#newdf

In [84]:
#create pie chart 
fig_pie = px.pie(newdf, names='Sentiment', title='Tweets Classification', height=250,
                 hole=0.7, color_discrete_sequence=px.colors.qualitative.T10)
fig_pie.update_traces(textfont=dict(color='#fff'))
fig_pie.update_layout(margin=dict(t=50, b=20, l=50, r=25),
                      plot_bgcolor='#2d3035', paper_bgcolor='#2d3035',
                      title_font=dict(size=25, color='#a5a7ab', family="Lato, sans-serif"),
                      font=dict(color='#8a8d93'),
                      legend=dict(orientation="h", yanchor="bottom", y=1, xanchor="right", x=0.8)
                      )

The visualisation depicts that Vader classified 46.4% tweets as positive, 31.8% as negative, and 21.9% as neutral. This is surprising as I expected most tweets about Covid-19 to be negative. Vader might be returning such a result as it may be classifying all tweets with the words 'Booster' and 'new' as positive, whereas their classification depends on the context of the tweet. 

### Manual Investigation 
In this section, I extract 300 tweets (100 each) as classified by Vader. I convert the dataframe to an excel file where I manually classify each of the 300 into the three sentiments.

In [85]:
#list of Neutral sentiment
dfNeutral = newdf[newdf["Sentiment"] == 'Neutral']

In [86]:
#100 random Neutral tweets
sampleNeutral = dfNeutral.sample(n=100)

In [87]:
#list of Positive sentiment
dfPos = newdf[newdf["Sentiment"] == 'Positive']

In [88]:
#100 random Positive tweets
samplePos = dfPos.sample(n=100)

In [89]:
#list of Negative sentiment
dfNeg = newdf[newdf["Sentiment"] == 'Negative']

In [90]:
#100 random Negative tweets
sampleNeg = dfNeg.sample(n=100)

In [91]:
#join all lists
fullsample = sampleNeutral.append(samplePos)
fullSample = fullsample.append(sampleNeg)

In [92]:
#extract relevant columns of dataframe as list, convert to dataframe, and then join
df1 = fullSample['tweet_without_stopwords']
df1 = df1.to_frame()

In [93]:
df2 = fullSample['Sentiment']
df2 = df2.to_frame()

In [94]:
df3 = fullSample['full_text']
df3 = df3.to_frame()

In [95]:
dfsample = df1.join(df2)

In [96]:
dfSample = df3.join(dfsample)

In [97]:
#save as excel file, ready for manual investigation
dfSample.to_excel("dfSampleScraped.xlsx") 

### Importing excel dataset with 'Corrected' sentiment
In this section, I import the manually investigated dataset and calculate how many Vader Sentiment Analyser correctly classified.

In [98]:
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile

In [99]:
dfedited = pd.read_excel("~/Documents/dfSampleScraped2.xlsx")
dfedited

Unnamed: 0.1,Unnamed: 0,full_text,tweet_without_stopwords,Sentiment,Corrected
0,990,Enjoyment/risk ratios live in my head now. #CO...,Enjoyment/risk ratios live head now. #COVID19,Neutral,Negative
1,904,#SouthKorea to Test #AI-Powered #FacialRecogni...,#SouthKorea Test #AI-Powered #FacialRecognitio...,Neutral,Neutral
2,129,Our 75th KB Friday Recap! \n\nCan you believe ...,Our 75th KB Friday Recap! Can believe it? Tune...,Neutral,Neutral
3,1649,So now the experts have decided that you don’t...,So experts decided don’t wait 15mins post covi...,Neutral,Negative
4,1334,"Our latest #research on #COVID19, published wi...","Our latest #research #COVID19, published @Spri...",Neutral,Neutral
...,...,...,...,...,...
295,1732,BREAKING: NY Ethics Committee ordering Andrew ...,BREAKING: NY Ethics Committee ordering Andrew ...,Negative,Negative
296,135,#COVID19\n@mtosterholm @CovidWatch @ScottGottl...,#COVID19 @mtosterholm @CovidWatch @ScottGottli...,Negative,Neutral
297,119,How to show a negative Covid test:\nTake test ...,How show negative Covid test: Take test packet...,Negative,Negative
298,421,@ARanganathan72 @narendramodi Agree @ARanganat...,@ARanganathan72 @narendramodi Agree @ARanganat...,Negative,Negative


In [100]:
dfSample = fullSample.reset_index()
dfSample

Unnamed: 0,index,screen_name,name,location,description,geo_enabled,created_at,full_text,display_text_range,source,entities,user,tweet_without_stopwords,Polarity Score,Neutral Score,Negative Score,Positive Score,Sentiment
0,158,ncIMPACTsog,ncIMPACT Initiative,"Chapel Hill, NC",Supporting policy innovations through key insi...,True,Tue Dec 14 20:36:08 +0000 2021,"While local governments work to rebuild, rewor...","[0, 278]","<a href=""https://mobile.twitter.com"" rel=""nofo...","{'hashtags': [{'text': 'ncIMPACT', 'indices': ...","{'id': 798639717705273344, 'id_str': '79863971...","While local governments work rebuild, rework, ...",0.0000,1.000,0.000,0.000,Neutral
1,1377,Farquetoo,farquetoo,australia,live life to the fullest. Drink and eat till y...,True,Tue Dec 14 19:21:45 +0000 2021,Does her maths add up. #COVID19 #Omicron #vacc...,"[0, 62]","<a href=""http://twitter.com/download/android"" ...","{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 211105255, 'id_str': '211105255', 'name...",Does maths add up. #COVID19 #Omicron #vaccine ...,0.0000,1.000,0.000,0.000,Neutral
2,1539,rantmashup,Rant Mashup,,You've got to be kidding me!...,False,Tue Dec 14 19:11:52 +0000 2021,To all those working in the #Medical #healthca...,"[0, 220]","<a href=""http://twitter.com/download/iphone"" r...","{'hashtags': [{'text': 'Medical', 'indices': [...","{'id': 235126980, 'id_str': '235126980', 'name...",To working #Medical #healthcare profession thr...,0.0000,1.000,0.000,0.000,Neutral
3,1593,arcticket,Arcticket ✈️ Visa Assistance,Manila Philippines,Arcticket is the leading Visa Assistance Agenc...,True,Tue Dec 14 19:08:56 +0000 2021,"#Africa, #UK Update: British authorities will ...","[0, 116]","<a href=""http://twitter.com/download/android"" ...","{'hashtags': [{'text': 'Africa', 'indices': [0...","{'id': 1117254279763570688, 'id_str': '1117254...","#Africa, #UK Update: British authorities remov...",0.0000,1.000,0.000,0.000,Neutral
4,732,CovidData2,Covid Data,,Posting Covid-19 data from web sources,False,Tue Dec 14 20:00:20 +0000 2021,New COVID-19 Data at 2021-12-14 03:00:00 pm ES...,"[0, 79]","<a href=""https://kojospace.com/"" rel=""nofollow...","{'hashtags': [{'text': 'Coronavirus', 'indices...","{'id': 1246809881790857216, 'id_str': '1246809...",New Data 2021-12-14 03:00:00 pm EST #Coronavir...,0.0000,1.000,0.000,0.000,Neutral
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
295,392,crossky,Craig,"Iowa, USA",Check out my blog! Airline Pilot & Entrepreneu...,True,Tue Dec 14 20:22:13 +0000 2021,@downin121 @1ThessCh5 Sara believes she is bei...,"[22, 238]","<a href=""http://twitter.com/download/iphone"" r...","{'hashtags': [{'text': 'antivax', 'indices': [...","{'id': 15623103, 'id_str': '15623103', 'name':...",@downin121 @1ThessCh5 Sara believes persecuted...,-0.3182,0.897,0.103,0.000,Negative
296,1776,SSweetheartsPod,Slapshot Sweethearts,,Your go-to for NHL and PHF news and banter hos...,True,Tue Dec 14 18:59:31 +0000 2021,With Craig Smith and Brad Marchand entering #C...,"[0, 182]","<a href=""http://twitter.com/download/iphone"" r...","{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 1338637984594333699, 'id_str': '1338637...",With Craig Smith Brad Marchand entering #COVID...,-0.2263,0.913,0.087,0.000,Negative
297,1362,LaoTzu5,Been There. Done That 🖕🏾 2020,"England, UK",🧐 By all means look at this self-declaration. ...,False,Tue Dec 14 19:22:41 +0000 2021,"59,610 new #Covid19 cases today. \nNot sure h...","[0, 82]","<a href=""https://mobile.twitter.com"" rel=""nofo...","{'hashtags': [{'text': 'Covid19', 'indices': [...","{'id': 631852258, 'id_str': '631852258', 'name...","59,610 new #Covid19 cases today. Not sure many...",-0.2411,0.836,0.164,0.000,Negative
298,1767,winder_gill,Dr. Winder Gill,"Ontario, Canada",FRCPC Internal Medicine - Clinical Immunology ...,False,Tue Dec 14 18:59:56 +0000 2021,It’s sad to consider cancelling holiday plans ...,"[0, 281]","<a href=""http://twitter.com/download/iphone"" r...","{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 2875332224, 'id_str': '2875332224', 'na...",It’s sad consider cancelling holiday plans pla...,-0.2960,0.759,0.155,0.085,Negative


In [101]:
dfedited.drop('Unnamed: 0', axis=1, inplace=True)
#dfedited

In [102]:
#replace all corrected ones with proper punctuation 
dfedited['Corrected'] = dfedited['Corrected'].str.capitalize()

In [103]:
#new column to measure how many match 
dfedited["Suspicion"] = dfedited["Corrected"] != dfedited["Sentiment"]
dfedited

Unnamed: 0,full_text,tweet_without_stopwords,Sentiment,Corrected,Suspicion
0,Enjoyment/risk ratios live in my head now. #CO...,Enjoyment/risk ratios live head now. #COVID19,Neutral,Negative,True
1,#SouthKorea to Test #AI-Powered #FacialRecogni...,#SouthKorea Test #AI-Powered #FacialRecognitio...,Neutral,Neutral,False
2,Our 75th KB Friday Recap! \n\nCan you believe ...,Our 75th KB Friday Recap! Can believe it? Tune...,Neutral,Neutral,False
3,So now the experts have decided that you don’t...,So experts decided don’t wait 15mins post covi...,Neutral,Negative,True
4,"Our latest #research on #COVID19, published wi...","Our latest #research #COVID19, published @Spri...",Neutral,Neutral,False
...,...,...,...,...,...
295,BREAKING: NY Ethics Committee ordering Andrew ...,BREAKING: NY Ethics Committee ordering Andrew ...,Negative,Negative,False
296,#COVID19\n@mtosterholm @CovidWatch @ScottGottl...,#COVID19 @mtosterholm @CovidWatch @ScottGottli...,Negative,Neutral,True
297,How to show a negative Covid test:\nTake test ...,How show negative Covid test: Take test packet...,Negative,Negative,False
298,@ARanganathan72 @narendramodi Agree @ARanganat...,@ARanganathan72 @narendramodi Agree @ARanganat...,Negative,Negative,False


In [104]:
#see how many dont match
final = dfedited.loc[dfedited["Suspicion"] == True]
final #107/300 do not match

Unnamed: 0,full_text,tweet_without_stopwords,Sentiment,Corrected,Suspicion
0,Enjoyment/risk ratios live in my head now. #CO...,Enjoyment/risk ratios live head now. #COVID19,Neutral,Negative,True
3,So now the experts have decided that you don’t...,So experts decided don’t wait 15mins post covi...,Neutral,Negative,True
6,BREAKING: #Hawaii sees 214 new #coronavirus ca...,BREAKING: #Hawaii sees 214 new #coronavirus ca...,Neutral,Negative,True
8,Because a feature of populist democracy is zer...,Because feature populist democracy zero attent...,Neutral,Negative,True
10,Delta variant slows recovery in metro Denver o...,Delta variant slows recovery metro Denver offi...,Neutral,Negative,True
...,...,...,...,...,...
270,@fiannafailparty @MichealMartinTD @RTEOne that...,@fiannafailparty @MichealMartinTD @RTEOne come...,Negative,Neutral,True
273,We are in an information war.\n\nThe crux is t...,We information war. The crux institutions need...,Negative,Positive,True
279,Brilliant! Viewer gets in touch with @peter_le...,Brilliant! Viewer gets touch @peter_levy he’s ...,Negative,Positive,True
290,#COVID19 vaccines teach your T cells to elimin...,#COVID19 vaccines teach T cells eliminate infe...,Negative,Positive,True


In [105]:
##DATA CLEANING ##
dfedited["Corrected"].value_counts() 

Negative      94
Positive      67
Neutral       61
Negative      52
Positive      19
Negative       6
Positivr       1
Name: Corrected, dtype: int64

In [106]:
dfedited.loc[dfedited["Corrected"] == "Negative"] 

Unnamed: 0,full_text,tweet_without_stopwords,Sentiment,Corrected,Suspicion
35,Yup I ordered a pack yesterday. F the province...,Yup I ordered pack yesterday. F provinces maki...,Neutral,Negative,True
36,Prime Minister Mark Rutte and Health Minister ...,Prime Minister Mark Rutte Health Minister Hugo...,Neutral,Negative,True
48,164 New Cases Of #Covid19 Today In Manitoba! T...,164 New Cases Of #Covid19 Today In Manitoba! T...,Neutral,Negative,True
149,"@Sillyshib So, the people who wanted #Brexit b...","@Sillyshib So, people wanted #Brexit believed ...",Positive,Negative,True
150,Couldn't give a fuck hoe many voted for the #M...,Couldn't give fuck hoe many voted #MaskMandate...,Positive,Negative,True
...,...,...,...,...,...
294,Take a look at how condescending the tone is. ...,Take look condescending tone is. Have poison i...,Negative,Negative,False
295,BREAKING: NY Ethics Committee ordering Andrew ...,BREAKING: NY Ethics Committee ordering Andrew ...,Negative,Negative,False
297,How to show a negative Covid test:\nTake test ...,How show negative Covid test: Take test packet...,Negative,Negative,False
298,@ARanganathan72 @narendramodi Agree @ARanganat...,@ARanganathan72 @narendramodi Agree @ARanganat...,Negative,Negative,False


In [107]:
dfedited["Corrected"].replace({"Negative ": "Negative", "Positive ": "Positive"}, inplace=True)

In [108]:
dfedited["Corrected"].value_counts()

Negative      146
Positive       86
Neutral        61
Negative        6
Positivr        1
Name: Corrected, dtype: int64

In [109]:
dfedited.index[dfedited['Corrected']== 'Negative  '].tolist()

[58, 59, 62, 63, 64, 65]

In [110]:
dfedited["Corrected"].replace({"Negative  ": "Negative", "Positivr": "Positive"}, inplace=True)

In [111]:
dfedited["Corrected"].value_counts()

Negative    152
Positive     87
Neutral      61
Name: Corrected, dtype: int64

This sampled and corrected dataset has 152 Negative tweets, 87 Postive tweets, and 61 Neutral tweets. However, since this si to be the training dataset, the tweets need to be balanced in classification. Hence, we must resample and add to this dataset to equalise the number of tweets of each sentiment.

### Re-sampling to acheive balance in sentiment classification 
In this section, I resample ocne more to achieve balance on tweets classified as Negative, Positive, and Neutral. This is to ensure that the Naive Bayes classifier is equally trained on all three sentiments.

In [112]:
rowstodrop = dfSample["index"].to_list()
#rowstodrop

In [113]:
resampledf = newdf.drop(rowstodrop)
#resampledf

In [114]:
#now that we have excluded previous sample, let's resample

In [115]:
dfNeutral2 = resampledf[resampledf["Sentiment"] == 'Neutral']
sampleNeutral2 = dfNeutral2.sample(n=200)

In [116]:
dfPos2 = resampledf[resampledf["Sentiment"] == 'Positive']
samplePos2 = dfPos2.sample(n=200)

In [117]:
fullsample2 = sampleNeutral2.append(samplePos2)
#fullsample2

In [118]:
redf1 = fullsample2['tweet_without_stopwords']
redf1 = redf1.to_frame()

In [119]:
redf2 = fullsample2['Sentiment']
redf2 = redf2.to_frame()

In [120]:
redf3 = fullsample2['full_text']
redf3 = redf3.to_frame()

In [121]:
redfsample = redf1.join(redf2)

In [122]:
redfSample = redf3.join(redfsample)

In [123]:
redfSample.to_excel("redfScraped2.xlsx") 

### Import resampled data 
In this section, I import the resampled data and join it with the previously sampled (and manually classified) dataset.

In [124]:
from pandas import ExcelWriter
from pandas import ExcelFile

In [125]:
redfedit = pd.read_excel("~/Documents/redfScraped2.xlsx")
redfedit

Unnamed: 0.1,Unnamed: 0,full_text,tweet_without_stopwords,Sentiment,Corrected
0,1499,Wasn't his crew or passengers vaccinated? #cov...,Wasn't crew passengers vaccinated? #covid19 ht...,Neutral,Neutral
1,584,Even if #Vaccinated y'all can get #COVID19 ?!?...,Even #Vaccinated y'all #COVID19 ?!?!?!?!?!?!?,Neutral,Negative
2,1269,"Rising above British politics, here’s the view...","Rising British politics, here’s view @WHO #Omi...",Neutral,Neutral
3,1664,Pharmacy-led sites to stay open for longer in ...,Pharmacy-led sites stay open longer COVID-19 j...,Neutral,Positive
4,1454,Will “B.C” follow suit? #bcpoli #savelives #CO...,Will “B.C” follow suit? #bcpoli #savelives #CO...,Neutral,Neutral
...,...,...,...,...,...
395,1482,For scientists - this article implies that the...,For scientists - article implies Laotian bat c...,Positive,
396,986,JUST IN (#coronavirus) via @CNN's @elizabethst...,JUST IN (#coronavirus) via @CNN's @elizabethst...,Positive,
397,1830,Boosted #boosted #COVID19 #CovidVaccine https:...,Boosted #boosted #COVID19 #CovidVaccine https:...,Positive,
398,966,NORTHERN CAPE COVID-19 STATISTICS AS AT 14 DEC...,NORTHERN CAPE COVID-19 STATISTICS AS AT 14 DEC...,Positive,


In [126]:
redfedit.drop('Unnamed: 0', axis=1, inplace=True)

In [127]:
redfedit["Corrected "].value_counts()

Negative    120
Neutral     106
Positive     89
Name: Corrected , dtype: int64

In [128]:
#create new column indicating whether the vader analysed sentiment an dthe manually corrected sentiment match
redfedit["Suspicion"] = redfedit["Corrected "] != redfedit["Sentiment"]
redfedit

Unnamed: 0,full_text,tweet_without_stopwords,Sentiment,Corrected,Suspicion
0,Wasn't his crew or passengers vaccinated? #cov...,Wasn't crew passengers vaccinated? #covid19 ht...,Neutral,Neutral,False
1,Even if #Vaccinated y'all can get #COVID19 ?!?...,Even #Vaccinated y'all #COVID19 ?!?!?!?!?!?!?,Neutral,Negative,True
2,"Rising above British politics, here’s the view...","Rising British politics, here’s view @WHO #Omi...",Neutral,Neutral,False
3,Pharmacy-led sites to stay open for longer in ...,Pharmacy-led sites stay open longer COVID-19 j...,Neutral,Positive,True
4,Will “B.C” follow suit? #bcpoli #savelives #CO...,Will “B.C” follow suit? #bcpoli #savelives #CO...,Neutral,Neutral,False
...,...,...,...,...,...
395,For scientists - this article implies that the...,For scientists - article implies Laotian bat c...,Positive,,True
396,JUST IN (#coronavirus) via @CNN's @elizabethst...,JUST IN (#coronavirus) via @CNN's @elizabethst...,Positive,,True
397,Boosted #boosted #COVID19 #CovidVaccine https:...,Boosted #boosted #COVID19 #CovidVaccine https:...,Positive,,True
398,NORTHERN CAPE COVID-19 STATISTICS AS AT 14 DEC...,NORTHERN CAPE COVID-19 STATISTICS AS AT 14 DEC...,Positive,,True


In [129]:
extraneutral = redfedit[redfedit["Corrected "] == 'Neutral']
impneutral = extraneutral[:90]
impneutral

Unnamed: 0,full_text,tweet_without_stopwords,Sentiment,Corrected,Suspicion
0,Wasn't his crew or passengers vaccinated? #cov...,Wasn't crew passengers vaccinated? #covid19 ht...,Neutral,Neutral,False
2,"Rising above British politics, here’s the view...","Rising British politics, here’s view @WHO #Omi...",Neutral,Neutral,False
4,Will “B.C” follow suit? #bcpoli #savelives #CO...,Will “B.C” follow suit? #bcpoli #savelives #CO...,Neutral,Neutral,False
5,Leading infectious diseases expert Professor S...,Leading infectious diseases expert Professor S...,Neutral,Neutral,False
6,Immunity system to second dose of #COVID19 va...,Immunity system second dose #COVID19 vaccinati...,Neutral,Neutral,False
...,...,...,...,...,...
185,@CPHO_Canada @JustinTrudeau #CoronaVirus #CoVi...,@CPHO_Canada @JustinTrudeau #CoronaVirus #CoVi...,Neutral,Neutral,False
186,NEWS: Our Lady's Hospital in Navan says all vi...,NEWS: Our Lady's Hospital Navan says visitors ...,Neutral,Neutral,False
187,Is this the normal you thought of when they to...,Is normal thought told ticket back vaccines? #...,Neutral,Neutral,False
188,I only post a few of these that I see. #Fauci ...,I post I see. #Fauci #EnoughIsEnough #COVID19 ...,Neutral,Neutral,False


In [130]:
extrapos = redfedit[redfedit["Corrected "] == 'Positive']
imppos = extrapos[:63]
imppos

Unnamed: 0,full_text,tweet_without_stopwords,Sentiment,Corrected,Suspicion
3,Pharmacy-led sites to stay open for longer in ...,Pharmacy-led sites stay open longer COVID-19 j...,Neutral,Positive,True
15,City of Saskatoon eyes return of staff to work...,City Saskatoon eyes return staff workplace htt...,Neutral,Positive,True
17,The #Topsfield Board of Health will be hosting...,The #Topsfield Board Health hosting #COVID19 v...,Neutral,Positive,True
27,"As of December 13, 2021, 239.9 million people,...","As December 13, 2021, 239.9 million people, 72...",Neutral,Positive,True
31,".@VolvoCEGlobal tests fully autonomous, batter...",".@VolvoCEGlobal tests fully autonomous, batter...",Neutral,Positive,True
...,...,...,...,...,...
247,.@US_FDA authorized Azova #COVID19 Saliva PCR ...,.@US_FDA authorized Azova #COVID19 Saliva PCR ...,Positive,Positive,False
250,"When Profits are so Good, they have to Keep th...","When Profits Good, Keep Up😂 @moderna_tx #COVID...",Positive,Positive,False
252,@TVKev Wee don’t want wars either ! But I gues...,@TVKev Wee don’t want wars either ! But I gues...,Positive,Positive,False
253,How to safely plan a family gathering this hol...,How safely plan family gathering holiday seaso...,Positive,Positive,False


In [131]:
joined1 = impneutral.append(imppos, sort = False)

In [132]:
joined1 = joined1.rename(columns={"Corrected ": "Corrected"})

In [133]:
dfedited

Unnamed: 0,full_text,tweet_without_stopwords,Sentiment,Corrected,Suspicion
0,Enjoyment/risk ratios live in my head now. #CO...,Enjoyment/risk ratios live head now. #COVID19,Neutral,Negative,True
1,#SouthKorea to Test #AI-Powered #FacialRecogni...,#SouthKorea Test #AI-Powered #FacialRecognitio...,Neutral,Neutral,False
2,Our 75th KB Friday Recap! \n\nCan you believe ...,Our 75th KB Friday Recap! Can believe it? Tune...,Neutral,Neutral,False
3,So now the experts have decided that you don’t...,So experts decided don’t wait 15mins post covi...,Neutral,Negative,True
4,"Our latest #research on #COVID19, published wi...","Our latest #research #COVID19, published @Spri...",Neutral,Neutral,False
...,...,...,...,...,...
295,BREAKING: NY Ethics Committee ordering Andrew ...,BREAKING: NY Ethics Committee ordering Andrew ...,Negative,Negative,False
296,#COVID19\n@mtosterholm @CovidWatch @ScottGottl...,#COVID19 @mtosterholm @CovidWatch @ScottGottli...,Negative,Neutral,True
297,How to show a negative Covid test:\nTake test ...,How show negative Covid test: Take test packet...,Negative,Negative,False
298,@ARanganathan72 @narendramodi Agree @ARanganat...,@ARanganathan72 @narendramodi Agree @ARanganat...,Negative,Negative,False


In [134]:
#final training dataframe 
myfinal = pd.concat([dfedited, joined1], axis=0) ## why are two corrected columns coming 

In [135]:
#balanced on sentiments 
myfinal["Corrected"].value_counts()

Negative    152
Neutral     151
Positive    150
Name: Corrected, dtype: int64

This dataset is ready to be used in the Naive Bayes Classifer 

### Text Classification with Naive Bayes
In this section, I fit the naive Bayes classifier on the training dataset through sci-kit learn.
Through cross validation, I compute the accuracy of the Naive Bayes classifier.

In [158]:
import sklearn

In [139]:
#turning the column of tweet without stopwords to a list - each row is a tweet
X = myfinal["tweet_without_stopwords"].tolist()
len(X)

453

In [140]:
Y = myfinal["Corrected"].tolist()
len(Y)

453

Here, I vectorize the data to make it into the training dataset. 

In [156]:
from sklearn.feature_extraction.text import CountVectorizer

In [142]:
vectorizer = CountVectorizer()   
X_train = vectorizer.fit_transform(X)
X_train

<453x3634 sparse matrix of type '<class 'numpy.int64'>'
	with 8394 stored elements in Compressed Sparse Row format>

In [143]:
X_train.toarray()

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

Next I fit the classifier on my dataset (X) and labels (Y)

In [144]:
#use sklearn to initialise classifier 
from sklearn.naive_bayes import MultinomialNB
classifier = MultinomialNB()
classifier

MultinomialNB()

In [145]:
classifier.fit(X_train, Y)

MultinomialNB()

In [146]:
classifier.classes_

array(['Negative', 'Neutral', 'Positive'], dtype='<U8')

In [147]:
# number of instances for each class
classifier.class_count_

array([152., 151., 150.])

In [148]:
# number of features for each class

classifier.feature_count_

array([[0., 2., 0., ..., 1., 2., 1.],
       [4., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.]])

In [149]:
#calculate log probabilities  
classifier.feature_log_prob_

array([[-8.84922702, -7.75061473, -8.84922702, ..., -8.15607984,
        -7.75061473, -8.15607984],
       [-7.10233473, -8.01862547, -8.71177265, ..., -8.71177265,
        -8.71177265, -8.71177265],
       [-8.82453082, -8.82453082, -8.13138364, ..., -8.82453082,
        -8.82453082, -8.82453082]])

Here, I test on one tweet from the training dataset to check if the classifier functions

In [150]:
test1 = X[0]
test1 = [test1]
test1

['Enjoyment/risk ratios live head now. #COVID19']

In [151]:
testVector1 = vectorizer.transform(test1)
testVector1

<1x3634 sparse matrix of type '<class 'numpy.int64'>'
	with 7 stored elements in Compressed Sparse Row format>

In [152]:
testVector1.toarray()

array([[0, 0, 0, ..., 0, 0, 0]])

In [153]:
classifier.predict(testVector1)

array(['Negative'], dtype='<U8')

In [163]:
from sklearn.model_selection import cross_validate
from sklearn.model_selection import cross_val_predict

The classifier predicted the sentiment of the test vector correctly. This indicates that the classifier works.

Next, I perform cross-validation to calcuate the scores of the model. I use cross-validation to predict the sentiments of all the tweets and then to cacluate the average accuracy of the model.

In [161]:
# Perform cross-validation
cv_results_cv = cross_validate(classifier, 
                                X_train,
                                Y,
                                cv=5
                                )
# Show results
cv_results_cv   #three classes so 0.3 is threshold 

{'fit_time': array([0.00552797, 0.00194907, 0.00251102, 0.00177813, 0.00166082]),
 'score_time': array([0.00097799, 0.0004189 , 0.00062799, 0.00041604, 0.00039411]),
 'test_score': array([0.58241758, 0.49450549, 0.42857143, 0.52222222, 0.58888889])}

In [164]:
cv_results_cvp = cross_val_predict(classifier, 
                                   X_train,
                                   Y,
                                   cv=5
                                   )
# Show results
cv_results_cvp

array(['Negative', 'Neutral', 'Neutral', 'Negative', 'Positive',
       'Neutral', 'Neutral', 'Neutral', 'Negative', 'Negative', 'Neutral',
       'Positive', 'Positive', 'Neutral', 'Neutral', 'Positive',
       'Neutral', 'Neutral', 'Neutral', 'Negative', 'Positive',
       'Negative', 'Negative', 'Negative', 'Negative', 'Positive',
       'Neutral', 'Neutral', 'Neutral', 'Positive', 'Positive', 'Neutral',
       'Positive', 'Negative', 'Negative', 'Positive', 'Neutral',
       'Neutral', 'Positive', 'Neutral', 'Negative', 'Neutral', 'Neutral',
       'Neutral', 'Neutral', 'Negative', 'Neutral', 'Neutral', 'Neutral',
       'Positive', 'Neutral', 'Positive', 'Neutral', 'Neutral', 'Neutral',
       'Neutral', 'Neutral', 'Negative', 'Positive', 'Neutral',
       'Negative', 'Neutral', 'Neutral', 'Neutral', 'Negative',
       'Negative', 'Neutral', 'Neutral', 'Negative', 'Positive',
       'Neutral', 'Negative', 'Positive', 'Neutral', 'Neutral',
       'Positive', 'Neutral', 'Negative', 

In [165]:
#calcuate the avarage accuracy of the model 
average_accuracy = sum(cv_results_cv['test_score'])/len(cv_results_cv['test_score'])
average_accuracy 
#0.5233211233211235

0.5233211233211235

Our classifer has an average accuracy of 0.533. Since this a multinomial 3 category classifier, the threshold value for the model is 0.33 (⅓). A value of 0.33 would suggest that the model works on pure randomization, and so, a value of 0.5233 suggests that the model works well. 

### Predicting sentiments for 2000 tweets 
In this section, I use the Naive Bayes classifer, now trained on sentiment classification, to predict the sentiments of all 2000 tweets. I represent the predictions in a pie chart. 

In [166]:
X1 = newdf["tweet_without_stopwords"].tolist()
len(X1)

2000

In [167]:
testVectorfull = vectorizer.transform(X1)
testVectorfull

<2000x3634 sparse matrix of type '<class 'numpy.int64'>'
	with 29906 stored elements in Compressed Sparse Row format>

In [168]:
testVectorfull.toarray()

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 1, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [169]:
myPredict = classifier.predict(testVectorfull)

In [170]:
type(myPredict)

numpy.ndarray

In [171]:
NaiveB = pd.DataFrame(myPredict, columns = ["Naive Bayes"])
NaiveB

Unnamed: 0,Naive Bayes
0,Negative
1,Negative
2,Positive
3,Negative
4,Negative
...,...
1995,Negative
1996,Negative
1997,Positive
1998,Negative


In [172]:
fulldf = newdf.join(NaiveB)
#fulldf

In [176]:
fulldf["Suspicion"] = fulldf["Sentiment"] != fulldf["Naive Bayes"]
fulldf

Unnamed: 0,screen_name,name,location,description,geo_enabled,created_at,full_text,display_text_range,source,entities,user,tweet_without_stopwords,Polarity Score,Neutral Score,Negative Score,Positive Score,Sentiment,Naive Bayes,Suspicion
0,HollyGreyhound,Bonnie & 🌈Holly Greyhound🌈💔😭,Nottingham U.K.,A sweetheart who loved cuddles OTRB 21/6/20💔Li...,False,Tue Dec 14 20:46:38 +0000 2021,Just how many of your constituents have had th...,"[0, 190]","<a href=""http://twitter.com/download/iphone"" r...",{'hashtags': [{'text': 'ToriesPartiedWhilePeop...,"{'id': 840188443, 'id_str': '840188443', 'name...",Just many constituents virus died @ABridgen on...,-0.5574,0.795,0.205,0.000,Negative,Negative,False
1,PutnamDV,DV Putnam County,"Putnam County, NY",Daily Voice Putnam County covers our friends a...,False,Tue Dec 14 20:46:38 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]","<a href=""http://www.dailyvoice.com"" rel=""nofol...","{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 2582876443, 'id_str': '2582876443', 'na...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False
2,VetCanuck,Terence Graham,Canada,#Veterans #MotherNature 🇨🇦❤⚽️👀\n#ClimateEmerg...,True,Tue Dec 14 20:46:37 +0000 2021,@joncoopertweets I'm a vet who had the anthrax...,"[17, 144]","<a href=""http://twitter.com/download/android"" ...","{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 809384904467316736, 'id_str': '80938490...",@joncoopertweets I'm vet anthrax vaccine years...,-0.2263,0.899,0.101,0.000,Negative,Positive,True
3,DVOrangeCounty,DV Orange County,,,False,Tue Dec 14 20:46:37 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]","<a href=""http://www.dailyvoice.com"" rel=""nofol...","{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 879726564388241408, 'id_str': '87972656...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False
4,DailyNassau,Daily Voice Nassau County,,,False,Tue Dec 14 20:46:37 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]","<a href=""https://dailyvoice.com/"" rel=""nofollo...","{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 1120343072343298048, 'id_str': '1120343...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,SpiritualCrypt1,Alpha-Wizz the #NFT Pro🅕,Tweets are not Financialadvice,Dad and husband\nPrometheus of Fantom! \n\nI c...,False,Tue Dec 14 18:45:33 +0000 2021,"I think, people are about to wake up from thei...","[0, 279]","<a href=""https://mobile.twitter.com"" rel=""nofo...","{'hashtags': [{'text': 'Covid19', 'indices': [...","{'id': 962273713562537984, 'id_str': '96227371...","I think, people wake sleep #Covid19 If governm...",0.0772,0.655,0.154,0.191,Positive,Negative,True
1996,InfectDisNews,Infectious Disease News,"Thorofare, NJ","The content you need, when you need it.",False,Tue Dec 14 18:45:30 +0000 2021,A final analysis confirmed that Pfizer’s inves...,"[0, 253]","<a href=""https://www.healio.com"" rel=""nofollow...","{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 67008915, 'id_str': '67008915', 'name':...",A final analysis confirmed Pfizer’s investigat...,-0.1027,0.742,0.152,0.106,Negative,Negative,False
1997,UHANDPartners,UHAND,"Houston, TX",@UHouston & @MDAndersonNews collaboration desi...,False,Tue Dec 14 18:45:24 +0000 2021,We are pleased to announce that our paper deta...,"[0, 227]","<a href=""https://mobile.twitter.com"" rel=""nofo...","{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 917490336993370112, 'id_str': '91749033...",We pleased announce paper detailing impacts #C...,0.2481,0.638,0.148,0.215,Positive,Positive,False
1998,Faal_26,Mohamed Falih Ali,Maldives,"Soldier, MBBS, MD Pulmonology, 🇲🇻 first",False,Tue Dec 14 18:45:22 +0000 2021,An analysis by the largest private health insu...,"[0, 273]","<a href=""http://twitter.com/download/iphone"" r...","{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 997946454718144512, 'id_str': '99794645...",An analysis largest private health insurance c...,0.0000,1.000,0.000,0.000,Neutral,Negative,True


In [180]:
#save fulldf to excel and inspect missing values for location 
#for this capstone round submission - use what you've got and then shift to choropeth maps 
#relevant columns for choropeth map are sentiment and location 

In [181]:
fulldf.to_excel('fulldftouse.xlsx') #saving full dataframe to manually inspect missing values for location

In [178]:
print(fulldf["Naive Bayes"].value_counts())

Negative    812
Positive    731
Neutral     457
Name: Naive Bayes, dtype: int64


In [179]:
print(fulldf["location"].value_counts())

                           442
United Kingdom              29
New York, NY                23
Whitby, Ontario, Canada     21
United States               19
                          ... 
Harare, Zimbabwe             1
Brisbane, Queensland         1
Elk Grove, CA                1
Manchester England           1
Barnstaple, England          1
Name: location, Length: 927, dtype: int64


In [182]:
fig_pie = px.pie(fulldf, names='Naive Bayes', title='Tweets Classifictaion', height=250,
                 hole=0.7, color_discrete_sequence=px.colors.qualitative.T10)
fig_pie.update_traces(textfont=dict(color='#fff'))
fig_pie.update_layout(margin=dict(t=80, b=30, l=70, r=40),
                      plot_bgcolor='#2d3035', paper_bgcolor='#2d3035',
                      title_font=dict(size=25, color='#a5a7ab', family="Lato, sans-serif"),
                      font=dict(color='#8a8d93'),
                      legend=dict(orientation="h", yanchor="bottom", y=1, xanchor="right", x=0.8)
                      )

It can be seen that in contrast to the Vader classification which listed most tweets as positive, the Naive Bayes classifier classifies 41.5% tweets as Negative, 35% as positive, and 23.5% as neutral. We see a difference in the classifiers. This highlights a growing need for classifiers, especially sentiment analysis classifiers, to be developed by taking into account social context in order to analyse people's opinions, as well as growing inequalities that social phenomena such as Covid-19 bring to light.

#### In this next step, we input the dataframe with the tweets that were manually classified according to location

In [196]:
mydf = pd.read_excel("fulldftouse.xlsx")

In [197]:
mydf

Unnamed: 0.1,Unnamed: 0,screen_name,name,location,actual,description,geo_enabled,created_at,full_text,display_text_range,...,entities,user,tweet_without_stopwords,Polarity Score,Neutral Score,Negative Score,Positive Score,Sentiment,Naive Bayes,Suspicion
0,0,HollyGreyhound,Bonnie & 🌈Holly Greyhound🌈💔😭,Nottingham U.K.,United Kingdom,A sweetheart who loved cuddles OTRB 21/6/20💔Li...,False,Tue Dec 14 20:46:38 +0000 2021,Just how many of your constituents have had th...,"[0, 190]",...,{'hashtags': [{'text': 'ToriesPartiedWhilePeop...,"{'id': 840188443, 'id_str': '840188443', 'name...",Just many constituents virus died @ABridgen on...,-0.5574,0.795,0.205,0.000,Negative,Negative,False
1,1,PutnamDV,DV Putnam County,"Putnam County, NY",USA,Daily Voice Putnam County covers our friends a...,False,Tue Dec 14 20:46:38 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 2582876443, 'id_str': '2582876443', 'na...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False
2,2,VetCanuck,Terence Graham,Canada,Canada,#Veterans #MotherNature 🇨🇦❤⚽️👀\n#ClimateEmerg...,True,Tue Dec 14 20:46:37 +0000 2021,@joncoopertweets I'm a vet who had the anthrax...,"[17, 144]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 809384904467316736, 'id_str': '80938490...",@joncoopertweets I'm vet anthrax vaccine years...,-0.2263,0.899,0.101,0.000,Negative,Positive,True
3,3,DVOrangeCounty,DV Orange County,,USA,,False,Tue Dec 14 20:46:37 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 879726564388241408, 'id_str': '87972656...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False
4,4,DailyNassau,Daily Voice Nassau County,,USA,,False,Tue Dec 14 20:46:37 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 1120343072343298048, 'id_str': '1120343...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,1995,SpiritualCrypt1,Alpha-Wizz the #NFT Pro🅕,Tweets are not Financialadvice,,Dad and husband\nPrometheus of Fantom! \n\nI c...,False,Tue Dec 14 18:45:33 +0000 2021,"I think, people are about to wake up from thei...","[0, 279]",...,"{'hashtags': [{'text': 'Covid19', 'indices': [...","{'id': 962273713562537984, 'id_str': '96227371...","I think, people wake sleep #Covid19 If governm...",0.0772,0.655,0.154,0.191,Positive,Negative,True
1996,1996,InfectDisNews,Infectious Disease News,"Thorofare, NJ",,"The content you need, when you need it.",False,Tue Dec 14 18:45:30 +0000 2021,A final analysis confirmed that Pfizer’s inves...,"[0, 253]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 67008915, 'id_str': '67008915', 'name':...",A final analysis confirmed Pfizer’s investigat...,-0.1027,0.742,0.152,0.106,Negative,Negative,False
1997,1997,UHANDPartners,UHAND,"Houston, TX",,@UHouston & @MDAndersonNews collaboration desi...,False,Tue Dec 14 18:45:24 +0000 2021,We are pleased to announce that our paper deta...,"[0, 227]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 917490336993370112, 'id_str': '91749033...",We pleased announce paper detailing impacts #C...,0.2481,0.638,0.148,0.215,Positive,Positive,False
1998,1998,Faal_26,Mohamed Falih Ali,Maldives,,"Soldier, MBBS, MD Pulmonology, 🇲🇻 first",False,Tue Dec 14 18:45:22 +0000 2021,An analysis by the largest private health insu...,"[0, 273]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 997946454718144512, 'id_str': '99794645...",An analysis largest private health insurance c...,0.0000,1.000,0.000,0.000,Neutral,Negative,True


In [199]:
mydf["actual"].value_counts()

USA                      295
United Kingdom           127
Canada                   125
Australia                 29
USA                       15
Germany                   11
Wales                      7
South Africa               7
India                      6
Spain                      6
Kenya                      5
New Zealand                3
Switzerland                3
Switzerland                3
Canada                     3
Trinidad and Tobago        2
Italy                      2
Iran                       2
France                     2
Pakistan                   2
United Arab Emirates       2
France                     2
Netherlands                2
Uganda                     2
South America              2
Ireland                    2
Ireland                    2
Cuba                       1
Cape Town                  1
Dubai                      1
Jamaica                    1
Africa/Nairobi             1
Somalia                    1
Ghana                      1
Austria       

In [201]:
mydf.index[mydf['actual']== 'USA '].tolist() #data cleaning 

[7, 21, 25, 27, 29, 30, 33, 37, 46, 54, 66, 74, 75, 91, 105]

In [204]:
mydf.index[mydf['actual']== 'Canada '].tolist() #data cleaning

[144, 197, 266]

In [206]:
mydf.index[mydf['actual']== 'Ontaria'].tolist() #data cleaning 

[976]

In [207]:
mydf["actual"].replace({"USA ": "USA", "Canada ": "Canada", "Ontaria" : "Canada"}, inplace=True) #data cleaning 

In [208]:
mydf["actual"].value_counts()

USA                      310
Canada                   129
United Kingdom           127
Australia                 29
Germany                   11
Wales                      7
South Africa               7
Spain                      6
India                      6
Kenya                      5
New Zealand                3
Switzerland                3
Switzerland                3
Trinidad and Tobago        2
Italy                      2
Iran                       2
France                     2
Pakistan                   2
France                     2
Netherlands                2
Uganda                     2
South America              2
Ireland                    2
United Arab Emirates       2
Ireland                    2
Cuba                       1
Cape Town                  1
Dubai                      1
Jamaica                    1
Israel                     1
Somalia                    1
Ghana                      1
Africa/Nairobi             1
Austria                    1
Ethiopia      

In [216]:
#Choosing to do only do USA, United Kingdom, and Canada for now
mydfUSA = mydf[mydf["actual"] == "USA"]
mydfUSA

Unnamed: 0.1,Unnamed: 0,screen_name,name,location,actual,description,geo_enabled,created_at,full_text,display_text_range,...,entities,user,tweet_without_stopwords,Polarity Score,Neutral Score,Negative Score,Positive Score,Sentiment,Naive Bayes,Suspicion
1,1,PutnamDV,DV Putnam County,"Putnam County, NY",USA,Daily Voice Putnam County covers our friends a...,False,Tue Dec 14 20:46:38 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 2582876443, 'id_str': '2582876443', 'na...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False
3,3,DVOrangeCounty,DV Orange County,,USA,,False,Tue Dec 14 20:46:37 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 879726564388241408, 'id_str': '87972656...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False
4,4,DailyNassau,Daily Voice Nassau County,,USA,,False,Tue Dec 14 20:46:37 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 1120343072343298048, 'id_str': '1120343...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False
5,5,Dayna_Fisher1,Dayna Fisher,USA,USA,"Health, MS, RYT 200, patient-centered marcomms...",False,Tue Dec 14 20:46:36 +0000 2021,@GovTimWalz @PennyWheelerMD How about making C...,"[28, 124]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 1450263086091407365, 'id_str': '1450263...",@GovTimWalz @PennyWheelerMD How making COVID v...,0.0772,0.894,0.000,0.106,Positive,Negative,True
7,7,JennaForMO,"Jenna Roberson, State Senate MO-SD2","Wentzville, MO",USA,Dem Candidate for MO Senate District 2. Tired ...,False,Tue Dec 14 20:46:27 +0000 2021,How does #COVID19 affect us?\n\nMy partner's d...,"[0, 264]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 1415425935583686660, 'id_str': '1415425...",How #COVID19 affect us? My partner's dad stepm...,0.7269,0.612,0.090,0.297,Positive,Negative,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
998,998,9erchic4life,seasoneddiva,Cali,USA,you do better...when you know better 😉,True,Tue Dec 14 19:44:53 +0000 2021,Bravo to @TishJames for advocating for HOME HE...,"[0, 254]",...,"{'hashtags': [{'text': 'TheView', 'indices': [...","{'id': 2740517256, 'id_str': '2740517256', 'na...",Bravo @TishJames advocating HOME HEALTH AIDES ...,0.8402,0.674,0.060,0.267,Positive,Positive,False
1000,1000,chitownrapper,AL Rapp AKA Allosaurus Raptor,Chicago IL / Taos NM,USA,Eats meat in moderation\nIndependent with Prog...,False,Tue Dec 14 19:44:48 +0000 2021,This is dead wrong.....this is how the US star...,"[0, 272]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 829865298685726720, 'id_str': '82986529...",This dead wrong.....this US starts penalizing ...,-0.4767,0.780,0.146,0.075,Negative,Positive,True
1002,1002,tomleykis,Even MORE Fully Vaccinated Tom Leykis,"Santa Barbara Cty, California",USA,Hear our podcast at https://t.co/LC270QtHNj. B...,False,Tue Dec 14 19:44:36 +0000 2021,You thought I was crazy when I predicted what'...,"[0, 92]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 171191577, 'id_str': '171191577', 'name...",You thought I crazy I predicted what's coming ...,-0.4003,0.803,0.197,0.000,Negative,Positive,True
1004,1004,Ken_DowneyJr,Ken Downey Jr.,"Thorofare, NJ",USA,I cover infectious diseases @InfectDisNews 🦠💉M...,True,Tue Dec 14 19:44:29 +0000 2021,@Tim6992 @EliseStefanik @morganfmckay TX #COVI...,"[38, 204]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 828688268409593858, 'id_str': '82868826...",@Tim6992 @EliseStefanik @morganfmckay TX #COVI...,0.4995,0.889,0.000,0.111,Positive,Negative,True


In [228]:
#rename USA to United States of America (as in the geojson) and assign geographical identifier
mydfUSA["actual"].replace({"USA": "United States of America"}, inplace=True)
mydfUSA["id"] = "USA"    



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [244]:
mydfCanada = mydf[mydf["actual"] == "Canada"]
mydfCanada

Unnamed: 0.1,Unnamed: 0,screen_name,name,location,actual,description,geo_enabled,created_at,full_text,display_text_range,...,entities,user,tweet_without_stopwords,Polarity Score,Neutral Score,Negative Score,Positive Score,Sentiment,Naive Bayes,Suspicion
2,2,VetCanuck,Terence Graham,Canada,Canada,#Veterans #MotherNature 🇨🇦❤⚽️👀\n#ClimateEmerg...,True,Tue Dec 14 20:46:37 +0000 2021,@joncoopertweets I'm a vet who had the anthrax...,"[17, 144]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 809384904467316736, 'id_str': '80938490...",@joncoopertweets I'm vet anthrax vaccine years...,-0.2263,0.899,0.101,0.000,Negative,Positive,True
35,35,rohin5r,Rohin Minocha-Mckenney,Canada,Canada,Mount Allison University | Biology and Commerc...,True,Tue Dec 14 20:44:30 +0000 2021,I’m really glad to see quick action on part of...,"[0, 250]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 2723702133, 'id_str': '2723702133', 'na...",I’m really glad see quick action part everyone...,0.8586,0.708,0.000,0.292,Positive,Negative,True
36,36,trycom88,Mr Sacha,"Alberta, Canada",Canada,Teacher! All over the place!,False,Tue Dec 14 20:44:30 +0000 2021,"@JustinTrudeau Hey, when will you renew CERB (...","[0, 207]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 76865074, 'id_str': '76865074', 'name':...","@JustinTrudeau Hey, renew CERB (or similar)???...",0.4393,0.707,0.113,0.180,Positive,Negative,True
47,47,Muskoka411,Muskoka411 News,"Muskoka, Ontario",Canada,News organization providing the best in breaki...,True,Tue Dec 14 20:43:46 +0000 2021,"Effective immediately, all visitors must be fu...","[0, 182]",...,"{'hashtags': [{'text': 'News', 'indices': [159...","{'id': 380680185, 'id_str': '380680185', 'name...","Effective immediately, visitors must fully vac...",0.7184,0.769,0.000,0.231,Positive,Positive,False
48,48,MiltonReporter,MiltonReporter,"Milton, Ontario",Canada,"Bringing a piece of #MiltonON to the world, an...",False,Tue Dec 14 20:43:43 +0000 2021,"Meanwhile, new protocols for Long-Term Care ha...","[0, 205]",...,"{'hashtags': [{'text': 'onpoli', 'indices': [1...","{'id': 1055242933098684417, 'id_str': '1055242...","Meanwhile, new protocols Long-Term Care introd...",0.6808,0.708,0.000,0.292,Positive,Negative,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
976,976,ChrisLewisEssex,Chris Lewis MP Essex,"Essex, Ontario",Canada,Chris Lewis: CPC Member of Parliament (FEDERAL...,True,Tue Dec 14 19:46:37 +0000 2021,Just moments ago I stood in the House of Commo...,"[0, 274]",...,"{'hashtags': [{'text': 'Windsor', 'indices': [...","{'id': 1034798285389737984, 'id_str': '1034798...",Just moments ago I stood House Commons bring a...,0.3365,0.931,0.000,0.069,Positive,Negative,True
979,979,OtagoGrad,Brad Rush,"Calgary, AB",Canada,ClimateCrisis skeptic. EX: biz consultant (org...,True,Tue Dec 14 19:46:23 +0000 2021,Disease Mitigation Measures in the Control of ...,"[0, 266]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 101193997, 'id_str': '101193997', 'name...",Disease Mitigation Measures Control Pandemic I...,0.1027,0.779,0.102,0.119,Positive,Positive,False
990,990,NicolasRobidoux,Nicolas Robidoux,Montréal,Canada,Thousandth monkey\nil/lui\nThe early bird gets...,False,Tue Dec 14 19:45:05 +0000 2021,Enjoyment/risk ratios live in my head now. #CO...,"[0, 51]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 565813688, 'id_str': '565813688', 'name...",Enjoyment/risk ratios live head now. #COVID19,0.0000,1.000,0.000,0.000,Neutral,Negative,True
999,999,Darrell_Samuels,Darrell Samuels,"Mississauga, ON",Canada,"Always-learning believer, mental health surviv...",False,Tue Dec 14 19:44:52 +0000 2021,Remind me again.... Is the light at the end of...,"[0, 77]",...,"{'hashtags': [{'text': 'NBA', 'indices': [59, ...","{'id': 35993064, 'id_str': '35993064', 'name':...",Remind again.... Is light end tunnel? #NBA #NH...,0.0000,1.000,0.000,0.000,Neutral,Positive,True


In [246]:
#assign Canada the geographical identifier 
mydfCanada["id"] = "CAN"
mydfCanada



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0.1,Unnamed: 0,screen_name,name,location,actual,description,geo_enabled,created_at,full_text,display_text_range,...,user,tweet_without_stopwords,Polarity Score,Neutral Score,Negative Score,Positive Score,Sentiment,Naive Bayes,Suspicion,id
2,2,VetCanuck,Terence Graham,Canada,Canada,#Veterans #MotherNature 🇨🇦❤⚽️👀\n#ClimateEmerg...,True,Tue Dec 14 20:46:37 +0000 2021,@joncoopertweets I'm a vet who had the anthrax...,"[17, 144]",...,"{'id': 809384904467316736, 'id_str': '80938490...",@joncoopertweets I'm vet anthrax vaccine years...,-0.2263,0.899,0.101,0.000,Negative,Positive,True,CAN
35,35,rohin5r,Rohin Minocha-Mckenney,Canada,Canada,Mount Allison University | Biology and Commerc...,True,Tue Dec 14 20:44:30 +0000 2021,I’m really glad to see quick action on part of...,"[0, 250]",...,"{'id': 2723702133, 'id_str': '2723702133', 'na...",I’m really glad see quick action part everyone...,0.8586,0.708,0.000,0.292,Positive,Negative,True,CAN
36,36,trycom88,Mr Sacha,"Alberta, Canada",Canada,Teacher! All over the place!,False,Tue Dec 14 20:44:30 +0000 2021,"@JustinTrudeau Hey, when will you renew CERB (...","[0, 207]",...,"{'id': 76865074, 'id_str': '76865074', 'name':...","@JustinTrudeau Hey, renew CERB (or similar)???...",0.4393,0.707,0.113,0.180,Positive,Negative,True,CAN
47,47,Muskoka411,Muskoka411 News,"Muskoka, Ontario",Canada,News organization providing the best in breaki...,True,Tue Dec 14 20:43:46 +0000 2021,"Effective immediately, all visitors must be fu...","[0, 182]",...,"{'id': 380680185, 'id_str': '380680185', 'name...","Effective immediately, visitors must fully vac...",0.7184,0.769,0.000,0.231,Positive,Positive,False,CAN
48,48,MiltonReporter,MiltonReporter,"Milton, Ontario",Canada,"Bringing a piece of #MiltonON to the world, an...",False,Tue Dec 14 20:43:43 +0000 2021,"Meanwhile, new protocols for Long-Term Care ha...","[0, 205]",...,"{'id': 1055242933098684417, 'id_str': '1055242...","Meanwhile, new protocols Long-Term Care introd...",0.6808,0.708,0.000,0.292,Positive,Negative,True,CAN
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
976,976,ChrisLewisEssex,Chris Lewis MP Essex,"Essex, Ontario",Canada,Chris Lewis: CPC Member of Parliament (FEDERAL...,True,Tue Dec 14 19:46:37 +0000 2021,Just moments ago I stood in the House of Commo...,"[0, 274]",...,"{'id': 1034798285389737984, 'id_str': '1034798...",Just moments ago I stood House Commons bring a...,0.3365,0.931,0.000,0.069,Positive,Negative,True,CAN
979,979,OtagoGrad,Brad Rush,"Calgary, AB",Canada,ClimateCrisis skeptic. EX: biz consultant (org...,True,Tue Dec 14 19:46:23 +0000 2021,Disease Mitigation Measures in the Control of ...,"[0, 266]",...,"{'id': 101193997, 'id_str': '101193997', 'name...",Disease Mitigation Measures Control Pandemic I...,0.1027,0.779,0.102,0.119,Positive,Positive,False,CAN
990,990,NicolasRobidoux,Nicolas Robidoux,Montréal,Canada,Thousandth monkey\nil/lui\nThe early bird gets...,False,Tue Dec 14 19:45:05 +0000 2021,Enjoyment/risk ratios live in my head now. #CO...,"[0, 51]",...,"{'id': 565813688, 'id_str': '565813688', 'name...",Enjoyment/risk ratios live head now. #COVID19,0.0000,1.000,0.000,0.000,Neutral,Negative,True,CAN
999,999,Darrell_Samuels,Darrell Samuels,"Mississauga, ON",Canada,"Always-learning believer, mental health surviv...",False,Tue Dec 14 19:44:52 +0000 2021,Remind me again.... Is the light at the end of...,"[0, 77]",...,"{'id': 35993064, 'id_str': '35993064', 'name':...",Remind again.... Is light end tunnel? #NBA #NH...,0.0000,1.000,0.000,0.000,Neutral,Positive,True,CAN


In [247]:
mydfUK = mydf[mydf["actual"] == "United Kingdom"]
mydfUK

Unnamed: 0.1,Unnamed: 0,screen_name,name,location,actual,description,geo_enabled,created_at,full_text,display_text_range,...,entities,user,tweet_without_stopwords,Polarity Score,Neutral Score,Negative Score,Positive Score,Sentiment,Naive Bayes,Suspicion
0,0,HollyGreyhound,Bonnie & 🌈Holly Greyhound🌈💔😭,Nottingham U.K.,United Kingdom,A sweetheart who loved cuddles OTRB 21/6/20💔Li...,False,Tue Dec 14 20:46:38 +0000 2021,Just how many of your constituents have had th...,"[0, 190]",...,{'hashtags': [{'text': 'ToriesPartiedWhilePeop...,"{'id': 840188443, 'id_str': '840188443', 'name...",Just many constituents virus died @ABridgen on...,-0.5574,0.795,0.205,0.000,Negative,Negative,False
6,6,brynll,Bryn Llewellyn,West Yorkshire & London,United Kingdom,#moveandlearn advocate * FRSA * @HundrEDorg am...,True,Tue Dec 14 20:46:28 +0000 2021,How low can scammers go?\n#COVID19 #scam https...,"[0, 39]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 164020902, 'id_str': '164020902', 'name...",How low scammers go? #COVID19 #scam https://t....,-0.7003,0.463,0.537,0.000,Negative,Neutral,True
11,11,martingfindlay,Martin Findlay,"Aberdeen, Scotland, UK",United Kingdom,"had two Covid-19 doses, a Covid-19 booster and...",True,Tue Dec 14 20:46:10 +0000 2021,"Good effort, 45.6K #Scotland #booster jabs Mon...","[0, 280]",...,"{'hashtags': [{'text': 'Scotland', 'indices': ...","{'id': 2490813418, 'id_str': '2490813418', 'na...","Good effort, 45.6K #Scotland #booster jabs Mon...",0.2382,0.954,0.000,0.046,Positive,Positive,False
17,17,msleeplessagain,💪,"England, United Kingdom",United Kingdom,"“The cry of the poor is not always just, but i...",True,Tue Dec 14 20:45:40 +0000 2021,We live in a society where you have to show a ...,"[0, 213]",...,"{'hashtags': [{'text': 'Omnicron', 'indices': ...","{'id': 571839701, 'id_str': '571839701', 'name...",We live society show vaccine passport doesn’t ...,0.0000,1.000,0.000,0.000,Neutral,Negative,True
18,18,thehrbooth,The HR Booth,"Dunfermline, Scotland",United Kingdom,"HR Consultancy, providing solutions for SME bu...",False,Tue Dec 14 20:45:39 +0000 2021,The First Minister made an announcement to the...,"[0, 274]",...,"{'hashtags': [{'text': 'covidmeasures', 'indic...","{'id': 383932169, 'id_str': '383932169', 'name...",The First Minister made announcement Scottish ...,0.5994,0.846,0.000,0.154,Positive,Positive,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
975,975,SiDedman,Simon Dedman,Chelmsford-London,United Kingdom,Political Reporter @BBCNews @BBCEssex former @...,True,Tue Dec 14 19:46:48 +0000 2021,96 Conservative MPs voting against the governm...,"[0, 253]",...,"{'hashtags': [{'text': 'Covid19', 'indices': [...","{'id': 40370254, 'id_str': '40370254', 'name':...",96 Conservative MPs voting government means wi...,-0.3612,0.906,0.094,0.000,Negative,Negative,False
985,985,lottyleeming,Charlotte Leeming,Yorkshire/Northern England,United Kingdom,Award-winning Senior Broadcast Journalist & TV...,True,Tue Dec 14 19:45:29 +0000 2021,Tory rebellion doesn’t stop covid passes becom...,"[0, 73]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 270855032, 'id_str': '270855032', 'name...",Tory rebellion doesn’t stop passes becoming la...,-0.4019,0.654,0.346,0.000,Negative,Negative,False
992,992,nanakumi1976,David Ofosu-Appiah,"England, United Kingdom",United Kingdom,"Law Advocate,Consultant",True,Tue Dec 14 19:45:00 +0000 2021,Boris Johnson suffers rebellion by almost 100 ...,"[0, 282]",...,"{'hashtags': [{'text': 'BoJo', 'indices': [131...","{'id': 276483767, 'id_str': '276483767', 'name...",Boris Johnson suffers rebellion almost 100 Tor...,-0.9337,0.549,0.391,0.059,Negative,Negative,False
1007,1007,ahmedaftab68,Sakkaf Ahmed Aftab,"Elsham, England",United Kingdom,"Eye doc,Chair BMA Yorkshire Consultant Committ...",False,Tue Dec 14 19:44:20 +0000 2021,@DRTomlinsonEP @DrAnneMurphy @CareQualityComm ...,"[140, 328]",...,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'id': 708238897688535040, 'id_str': '70823889...",@DRTomlinsonEP @DrAnneMurphy @CareQualityComm ...,0.6192,0.832,0.000,0.168,Positive,Negative,True


In [248]:
#assign geographical identifier 
mydfUK["id"] = "GBR"



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [255]:
#combine dataframes 
finaldf = pd.concat([mydfUSA, mydfCanada], axis=0)
finaldf2 = pd.concat([finaldf, mydfUK], axis = 0)
finaldf2

Unnamed: 0.1,Unnamed: 0,screen_name,name,location,actual,description,geo_enabled,created_at,full_text,display_text_range,...,user,tweet_without_stopwords,Polarity Score,Neutral Score,Negative Score,Positive Score,Sentiment,Naive Bayes,Suspicion,id
1,1,PutnamDV,DV Putnam County,"Putnam County, NY",United States of America,Daily Voice Putnam County covers our friends a...,False,Tue Dec 14 20:46:38 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]",...,"{'id': 2582876443, 'id_str': '2582876443', 'na...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False,USA
3,3,DVOrangeCounty,DV Orange County,,United States of America,,False,Tue Dec 14 20:46:37 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]",...,"{'id': 879726564388241408, 'id_str': '87972656...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False,USA
4,4,DailyNassau,Daily Voice Nassau County,,United States of America,,False,Tue Dec 14 20:46:37 +0000 2021,"* Cuomo ""American Crisis"" * An ethics committe...","[0, 219]",...,"{'id': 1120343072343298048, 'id_str': '1120343...","* Cuomo ""American Crisis"" * An ethics committe...",-0.6705,0.825,0.175,0.000,Negative,Negative,False,USA
5,5,Dayna_Fisher1,Dayna Fisher,USA,United States of America,"Health, MS, RYT 200, patient-centered marcomms...",False,Tue Dec 14 20:46:36 +0000 2021,@GovTimWalz @PennyWheelerMD How about making C...,"[28, 124]",...,"{'id': 1450263086091407365, 'id_str': '1450263...",@GovTimWalz @PennyWheelerMD How making COVID v...,0.0772,0.894,0.000,0.106,Positive,Negative,True,USA
7,7,JennaForMO,"Jenna Roberson, State Senate MO-SD2","Wentzville, MO",United States of America,Dem Candidate for MO Senate District 2. Tired ...,False,Tue Dec 14 20:46:27 +0000 2021,How does #COVID19 affect us?\n\nMy partner's d...,"[0, 264]",...,"{'id': 1415425935583686660, 'id_str': '1415425...",How #COVID19 affect us? My partner's dad stepm...,0.7269,0.612,0.090,0.297,Positive,Negative,True,USA
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
975,975,SiDedman,Simon Dedman,Chelmsford-London,United Kingdom,Political Reporter @BBCNews @BBCEssex former @...,True,Tue Dec 14 19:46:48 +0000 2021,96 Conservative MPs voting against the governm...,"[0, 253]",...,"{'id': 40370254, 'id_str': '40370254', 'name':...",96 Conservative MPs voting government means wi...,-0.3612,0.906,0.094,0.000,Negative,Negative,False,GBR
985,985,lottyleeming,Charlotte Leeming,Yorkshire/Northern England,United Kingdom,Award-winning Senior Broadcast Journalist & TV...,True,Tue Dec 14 19:45:29 +0000 2021,Tory rebellion doesn’t stop covid passes becom...,"[0, 73]",...,"{'id': 270855032, 'id_str': '270855032', 'name...",Tory rebellion doesn’t stop passes becoming la...,-0.4019,0.654,0.346,0.000,Negative,Negative,False,GBR
992,992,nanakumi1976,David Ofosu-Appiah,"England, United Kingdom",United Kingdom,"Law Advocate,Consultant",True,Tue Dec 14 19:45:00 +0000 2021,Boris Johnson suffers rebellion by almost 100 ...,"[0, 282]",...,"{'id': 276483767, 'id_str': '276483767', 'name...",Boris Johnson suffers rebellion almost 100 Tor...,-0.9337,0.549,0.391,0.059,Negative,Negative,False,GBR
1007,1007,ahmedaftab68,Sakkaf Ahmed Aftab,"Elsham, England",United Kingdom,"Eye doc,Chair BMA Yorkshire Consultant Committ...",False,Tue Dec 14 19:44:20 +0000 2021,@DRTomlinsonEP @DrAnneMurphy @CareQualityComm ...,"[140, 328]",...,"{'id': 708238897688535040, 'id_str': '70823889...",@DRTomlinsonEP @DrAnneMurphy @CareQualityComm ...,0.6192,0.832,0.000,0.168,Positive,Negative,True,GBR


In [251]:
#create subset with relevant colums for creating choropeth map 
finaldf2_subset = finaldf2[['actual', 'Polarity Score', 'Sentiment', 'id']]

In [252]:
finaldf2_subset["id"].value_counts()

USA    310
CAN    129
GBR    127
Name: id, dtype: int64

In [254]:
#save file to be imported back for choropeth map 
finaldf2_subset.to_excel("dfsubset2.xlsx") 