**LOAD LIBRARIES**

In [2]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import re
import json
import requests
from pprint import pprint
from IPython.display import HTML
import tweepy as tp
import twitter_credentials

# 2.0 EDA AND NLP WITH MICROSOFT AZURE

This notebool will help incorporate natural language processing (NLP) features into our classification modeling. This will use an API from Microsoft’s AI Text Analytics service on Azure’s cloud computing platform (https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/)

- We did some EDA on this Notebook to understand which feature would be applicable for modeling.
- The outpout of this Notebook will be used for modeling

**READ FINAL MASTER FROM JSON AND PROCEED TO EDA**

In [3]:
#with open('tweet_master.json') as json_file:  
with open('more_final_tweet_master.json') as json_file:  
    tweet_json = json.load(json_file)

#with open('user_master.json') as json_file:  
with open('more_final_user_master.json') as json_file:  
     user_json = json.load(json_file)

In [4]:
tweet_df = pd.read_json(tweet_json)

user_df = pd.read_json(user_json)

print(user_df.shape)
print(tweet_df.shape)

(112, 14)
(16700, 12)


In [5]:
# Re-index user table to be screen name since id is missing from twitter feed
user_df.set_index('screen_name',drop=False, inplace=True)


**MERGE COLUMNS**

In [6]:
# Here are verified users
bot_mask = user_df['known_bot'] == True
bots = user_df[bot_mask]
bots.shape

(0, 14)

In [7]:
# Here are known bots
verified_mask = user_df['verified'] == True
verifieds = user_df[verified_mask]
verifieds.shape

(107, 14)

In [8]:
# Here are authors of unknown 'bot or not' status
not_bot_mask = user_df['known_bot'] != True
not_verified_mask = user_df['verified'] != True
unknown_mask = np.logical_and(not_bot_mask, not_verified_mask)
unknowns = user_df[unknown_mask].copy()
unknowns.shape

(5, 14)

In [9]:
# Here are tweets for users of verified status
# NOTE:  the join
tweets_verified_users = tweet_df.join(verifieds,'user_screen_name',rsuffix='user',how='right')
tweets_verified_users.shape

(15950, 26)

In [10]:
# Here are tweets for users of unknown status
# NOTE:  the join
tweets_unknown_users = tweet_df.join(unknowns,'user_screen_name',rsuffix='user',how='right')
tweets_unknown_users.shape

(750, 26)

In [11]:
# Here are tweets for users of bot status
# NOTE:  the join
tweets_bot_users = tweet_df.join(bots,'user_screen_name',rsuffix='user',how='right')
tweets_bot_users.shape

(0, 26)

** ENGINEER NLP FEATURES **

- Will do engineering only for Tweets that have valid users, where the join with user table has succeeded. 
- We will only perform NLP for tweets in English
- The code below uses Microsoft Azure NLP APIs to extract sentiment and key phrases from each Tweet. View more at https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/quickstarts/python#extract-key-phrases



In [12]:
tweets_unknown_users["TypeOfUser"] = "Unknown"
tweets_bot_users["TypeOfUser"] = "Bot"
tweets_verified_users["TypeOfUser"] = "Verified"

new_tweet_df = pd.concat([tweets_unknown_users,tweets_bot_users, tweets_verified_users], ignore_index=True)

new_tweet_df["nlp_key_phrases"] = ""
new_tweet_df["nlp_count_key_phrases"] = 0
new_tweet_df["nlp_sentiment_score"] = 0.0

print(new_tweet_df.shape)

(16700, 30)


In [22]:
print(new_tweet_df.groupby("lang").size())

lang
ca         5
cs         5
cy         2
da         9
de        64
en     14977
es       187
et         8
eu         2
fi         3
fr        94
hi         1
ht        16
hu         3
in        21
is         3
it        83
ja         3
lt         4
lv         2
nl        15
no        13
pl         4
pt       155
ro         8
ru         4
sl         2
sr         1
sv         2
tl        34
tr         5
uk       110
und      852
vi         1
zh         2
dtype: int64


In [33]:
english_filter = new_tweet_df["lang"] == "en"
uk_filter = new_tweet_df["lang"] == "uk"

mask = np.logical_or(english_filter, uk_filter)

new_tweet_df_english = new_tweet_df[mask].copy()

new_tweet_df_english.reset_index(drop=True, inplace=True)

print(new_tweet_df_english.shape, len(new_tweet_df_english))

(15087, 30) 15087


In [34]:
## THE CODE below uses Microsoft Azure APIs to detect sentiment and key phrases from each Tweet
#https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/quickstarts/python#extract-key-phrases
text_analytics_base_url = "https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/"
sentiment_api_url = text_analytics_base_url + "sentiment"
key_phrase_api_url = text_analytics_base_url + "keyPhrases"

subscription_key = "CALL EUMAR FOR GUIDANCE - 954 918 4874"

start_pos = 0 
total = len(new_tweet_df_english)

while start_pos <= total:

    try:
        index1 = start_pos
        index2 = start_pos + 1
        index3 = start_pos + 2
        index4 = start_pos + 3
        index5 = start_pos + 4
        index6 = start_pos + 5
        index7 = start_pos + 6
        index8 = start_pos + 7
        index9 = start_pos + 8
        index10 = start_pos + 9        

        print("Started Processing: Range({0}, {1})".format(index1, index10))
            
        documents = {'documents' : [
          {'id': index1, 'language': 'en', 'text': new_tweet_df_english.loc[index1].text},
          {'id': index2, 'language': 'en', 'text': new_tweet_df_english.loc[index2].text},
          {'id': index3, 'language': 'en', 'text': new_tweet_df_english.loc[index3].text},
          {'id': index4, 'language': 'en', 'text': new_tweet_df_english.loc[index4].text},
          {'id': index5, 'language': 'en', 'text': new_tweet_df_english.loc[index5].text},
          {'id': index6, 'language': 'en', 'text': new_tweet_df_english.loc[index6].text},
          {'id': index7, 'language': 'en', 'text': new_tweet_df_english.loc[index7].text},
          {'id': index8, 'language': 'en', 'text': new_tweet_df_english.loc[index8].text},
          {'id': index9, 'language': 'en', 'text': new_tweet_df_english.loc[index9].text},
          {'id': index10, 'language': 'en', 'text': new_tweet_df_english.loc[index10].text}
        ]}


        headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
        response_key_phrase  = requests.post(key_phrase_api_url, headers=headers, json=documents)
        key_phrases = response_key_phrase.json()

        response_sentiment  = requests.post(sentiment_api_url, headers=headers, json=documents)
        sentiment = response_sentiment.json()


        for document in key_phrases["documents"]:
            id_doc = int(document["id"])
            num_phrases = len(document["keyPhrases"])
            phrases_doc = ",".join(document["keyPhrases"])

            new_tweet_df_english.loc[id_doc, "nlp_key_phrases"] = phrases_doc
            new_tweet_df_english.loc[id_doc, "nlp_count_key_phrases"] = num_phrases    

            #print(id_doc,new_tweet_df.loc[id_doc, "nlp_key_phrases"],  new_tweet_df.loc[id_doc, "nlp_count_key_phrases"])

        for document in sentiment["documents"]:
            id_doc = int(document["id"])
            score = document["score"]

            new_tweet_df_english.loc[id_doc, "nlp_sentiment_score"] = float(score)

            #print (id_doc, new_tweet_df.loc[id_doc, "nlp_sentiment_score"])

        print("Finished Processing: Range({0}, {1})".format(index1, index10))
        
        start_pos = start_pos + 10
    except Exception as e:
        print(f'SOME ERROR OCCURRED...PASSING!!!')
        print(e)
        start_pos = start_pos + 10
        pass
        


Started Processing: Range(0, 9)
Finished Processing: Range(0, 9)
Started Processing: Range(10, 19)
Finished Processing: Range(10, 19)
Started Processing: Range(20, 29)
Finished Processing: Range(20, 29)
Started Processing: Range(30, 39)
Finished Processing: Range(30, 39)
Started Processing: Range(40, 49)
Finished Processing: Range(40, 49)
Started Processing: Range(50, 59)
Finished Processing: Range(50, 59)
Started Processing: Range(60, 69)
Finished Processing: Range(60, 69)
Started Processing: Range(70, 79)
Finished Processing: Range(70, 79)
Started Processing: Range(80, 89)
Finished Processing: Range(80, 89)
Started Processing: Range(90, 99)
Finished Processing: Range(90, 99)
Started Processing: Range(100, 109)
Finished Processing: Range(100, 109)
Started Processing: Range(110, 119)
Finished Processing: Range(110, 119)
Started Processing: Range(120, 129)
Finished Processing: Range(120, 129)
Started Processing: Range(130, 139)
Finished Processing: Range(130, 139)
Started Processing: Ra

Finished Processing: Range(1120, 1129)
Started Processing: Range(1130, 1139)
Finished Processing: Range(1130, 1139)
Started Processing: Range(1140, 1149)
Finished Processing: Range(1140, 1149)
Started Processing: Range(1150, 1159)
Finished Processing: Range(1150, 1159)
Started Processing: Range(1160, 1169)
Finished Processing: Range(1160, 1169)
Started Processing: Range(1170, 1179)
Finished Processing: Range(1170, 1179)
Started Processing: Range(1180, 1189)
Finished Processing: Range(1180, 1189)
Started Processing: Range(1190, 1199)
Finished Processing: Range(1190, 1199)
Started Processing: Range(1200, 1209)
Finished Processing: Range(1200, 1209)
Started Processing: Range(1210, 1219)
Finished Processing: Range(1210, 1219)
Started Processing: Range(1220, 1229)
Finished Processing: Range(1220, 1229)
Started Processing: Range(1230, 1239)
Finished Processing: Range(1230, 1239)
Started Processing: Range(1240, 1249)
Finished Processing: Range(1240, 1249)
Started Processing: Range(1250, 1259)

Finished Processing: Range(2190, 2199)
Started Processing: Range(2200, 2209)
Finished Processing: Range(2200, 2209)
Started Processing: Range(2210, 2219)
Finished Processing: Range(2210, 2219)
Started Processing: Range(2220, 2229)
Finished Processing: Range(2220, 2229)
Started Processing: Range(2230, 2239)
Finished Processing: Range(2230, 2239)
Started Processing: Range(2240, 2249)
Finished Processing: Range(2240, 2249)
Started Processing: Range(2250, 2259)
Finished Processing: Range(2250, 2259)
Started Processing: Range(2260, 2269)
Finished Processing: Range(2260, 2269)
Started Processing: Range(2270, 2279)
Finished Processing: Range(2270, 2279)
Started Processing: Range(2280, 2289)
Finished Processing: Range(2280, 2289)
Started Processing: Range(2290, 2299)
Finished Processing: Range(2290, 2299)
Started Processing: Range(2300, 2309)
Finished Processing: Range(2300, 2309)
Started Processing: Range(2310, 2319)
Finished Processing: Range(2310, 2319)
Started Processing: Range(2320, 2329)

Finished Processing: Range(3260, 3269)
Started Processing: Range(3270, 3279)
Finished Processing: Range(3270, 3279)
Started Processing: Range(3280, 3289)
Finished Processing: Range(3280, 3289)
Started Processing: Range(3290, 3299)
Finished Processing: Range(3290, 3299)
Started Processing: Range(3300, 3309)
Finished Processing: Range(3300, 3309)
Started Processing: Range(3310, 3319)
Finished Processing: Range(3310, 3319)
Started Processing: Range(3320, 3329)
Finished Processing: Range(3320, 3329)
Started Processing: Range(3330, 3339)
Finished Processing: Range(3330, 3339)
Started Processing: Range(3340, 3349)
Finished Processing: Range(3340, 3349)
Started Processing: Range(3350, 3359)
Finished Processing: Range(3350, 3359)
Started Processing: Range(3360, 3369)
Finished Processing: Range(3360, 3369)
Started Processing: Range(3370, 3379)
Finished Processing: Range(3370, 3379)
Started Processing: Range(3380, 3389)
Finished Processing: Range(3380, 3389)
Started Processing: Range(3390, 3399)

Finished Processing: Range(4330, 4339)
Started Processing: Range(4340, 4349)
Finished Processing: Range(4340, 4349)
Started Processing: Range(4350, 4359)
Finished Processing: Range(4350, 4359)
Started Processing: Range(4360, 4369)
Finished Processing: Range(4360, 4369)
Started Processing: Range(4370, 4379)
Finished Processing: Range(4370, 4379)
Started Processing: Range(4380, 4389)
Finished Processing: Range(4380, 4389)
Started Processing: Range(4390, 4399)
Finished Processing: Range(4390, 4399)
Started Processing: Range(4400, 4409)
Finished Processing: Range(4400, 4409)
Started Processing: Range(4410, 4419)
Finished Processing: Range(4410, 4419)
Started Processing: Range(4420, 4429)
Finished Processing: Range(4420, 4429)
Started Processing: Range(4430, 4439)
Finished Processing: Range(4430, 4439)
Started Processing: Range(4440, 4449)
Finished Processing: Range(4440, 4449)
Started Processing: Range(4450, 4459)
Finished Processing: Range(4450, 4459)
Started Processing: Range(4460, 4469)

Finished Processing: Range(5400, 5409)
Started Processing: Range(5410, 5419)
Finished Processing: Range(5410, 5419)
Started Processing: Range(5420, 5429)
Finished Processing: Range(5420, 5429)
Started Processing: Range(5430, 5439)
Finished Processing: Range(5430, 5439)
Started Processing: Range(5440, 5449)
Finished Processing: Range(5440, 5449)
Started Processing: Range(5450, 5459)
Finished Processing: Range(5450, 5459)
Started Processing: Range(5460, 5469)
Finished Processing: Range(5460, 5469)
Started Processing: Range(5470, 5479)
Finished Processing: Range(5470, 5479)
Started Processing: Range(5480, 5489)
Finished Processing: Range(5480, 5489)
Started Processing: Range(5490, 5499)
Finished Processing: Range(5490, 5499)
Started Processing: Range(5500, 5509)
Finished Processing: Range(5500, 5509)
Started Processing: Range(5510, 5519)
Finished Processing: Range(5510, 5519)
Started Processing: Range(5520, 5529)
Finished Processing: Range(5520, 5529)
Started Processing: Range(5530, 5539)

Finished Processing: Range(6470, 6479)
Started Processing: Range(6480, 6489)
Finished Processing: Range(6480, 6489)
Started Processing: Range(6490, 6499)
Finished Processing: Range(6490, 6499)
Started Processing: Range(6500, 6509)
Finished Processing: Range(6500, 6509)
Started Processing: Range(6510, 6519)
Finished Processing: Range(6510, 6519)
Started Processing: Range(6520, 6529)
Finished Processing: Range(6520, 6529)
Started Processing: Range(6530, 6539)
Finished Processing: Range(6530, 6539)
Started Processing: Range(6540, 6549)
Finished Processing: Range(6540, 6549)
Started Processing: Range(6550, 6559)
Finished Processing: Range(6550, 6559)
Started Processing: Range(6560, 6569)
Finished Processing: Range(6560, 6569)
Started Processing: Range(6570, 6579)
Finished Processing: Range(6570, 6579)
Started Processing: Range(6580, 6589)
Finished Processing: Range(6580, 6589)
Started Processing: Range(6590, 6599)
Finished Processing: Range(6590, 6599)
Started Processing: Range(6600, 6609)

Finished Processing: Range(7540, 7549)
Started Processing: Range(7550, 7559)
Finished Processing: Range(7550, 7559)
Started Processing: Range(7560, 7569)
Finished Processing: Range(7560, 7569)
Started Processing: Range(7570, 7579)
Finished Processing: Range(7570, 7579)
Started Processing: Range(7580, 7589)
Finished Processing: Range(7580, 7589)
Started Processing: Range(7590, 7599)
Finished Processing: Range(7590, 7599)
Started Processing: Range(7600, 7609)
Finished Processing: Range(7600, 7609)
Started Processing: Range(7610, 7619)
Finished Processing: Range(7610, 7619)
Started Processing: Range(7620, 7629)
Finished Processing: Range(7620, 7629)
Started Processing: Range(7630, 7639)
Finished Processing: Range(7630, 7639)
Started Processing: Range(7640, 7649)
Finished Processing: Range(7640, 7649)
Started Processing: Range(7650, 7659)
Finished Processing: Range(7650, 7659)
Started Processing: Range(7660, 7669)
Finished Processing: Range(7660, 7669)
Started Processing: Range(7670, 7679)

Finished Processing: Range(8610, 8619)
Started Processing: Range(8620, 8629)
Finished Processing: Range(8620, 8629)
Started Processing: Range(8630, 8639)
Finished Processing: Range(8630, 8639)
Started Processing: Range(8640, 8649)
Finished Processing: Range(8640, 8649)
Started Processing: Range(8650, 8659)
Finished Processing: Range(8650, 8659)
Started Processing: Range(8660, 8669)
Finished Processing: Range(8660, 8669)
Started Processing: Range(8670, 8679)
Finished Processing: Range(8670, 8679)
Started Processing: Range(8680, 8689)
Finished Processing: Range(8680, 8689)
Started Processing: Range(8690, 8699)
Finished Processing: Range(8690, 8699)
Started Processing: Range(8700, 8709)
Finished Processing: Range(8700, 8709)
Started Processing: Range(8710, 8719)
Finished Processing: Range(8710, 8719)
Started Processing: Range(8720, 8729)
Finished Processing: Range(8720, 8729)
Started Processing: Range(8730, 8739)
Finished Processing: Range(8730, 8739)
Started Processing: Range(8740, 8749)

Finished Processing: Range(9680, 9689)
Started Processing: Range(9690, 9699)
Finished Processing: Range(9690, 9699)
Started Processing: Range(9700, 9709)
Finished Processing: Range(9700, 9709)
Started Processing: Range(9710, 9719)
Finished Processing: Range(9710, 9719)
Started Processing: Range(9720, 9729)
Finished Processing: Range(9720, 9729)
Started Processing: Range(9730, 9739)
Finished Processing: Range(9730, 9739)
Started Processing: Range(9740, 9749)
Finished Processing: Range(9740, 9749)
Started Processing: Range(9750, 9759)
Finished Processing: Range(9750, 9759)
Started Processing: Range(9760, 9769)
Finished Processing: Range(9760, 9769)
Started Processing: Range(9770, 9779)
Finished Processing: Range(9770, 9779)
Started Processing: Range(9780, 9789)
Finished Processing: Range(9780, 9789)
Started Processing: Range(9790, 9799)
Finished Processing: Range(9790, 9799)
Started Processing: Range(9800, 9809)
Finished Processing: Range(9800, 9809)
Started Processing: Range(9810, 9819)

Finished Processing: Range(10710, 10719)
Started Processing: Range(10720, 10729)
Finished Processing: Range(10720, 10729)
Started Processing: Range(10730, 10739)
Finished Processing: Range(10730, 10739)
Started Processing: Range(10740, 10749)
Finished Processing: Range(10740, 10749)
Started Processing: Range(10750, 10759)
Finished Processing: Range(10750, 10759)
Started Processing: Range(10760, 10769)
Finished Processing: Range(10760, 10769)
Started Processing: Range(10770, 10779)
Finished Processing: Range(10770, 10779)
Started Processing: Range(10780, 10789)
Finished Processing: Range(10780, 10789)
Started Processing: Range(10790, 10799)
Finished Processing: Range(10790, 10799)
Started Processing: Range(10800, 10809)
Finished Processing: Range(10800, 10809)
Started Processing: Range(10810, 10819)
Finished Processing: Range(10810, 10819)
Started Processing: Range(10820, 10829)
Finished Processing: Range(10820, 10829)
Started Processing: Range(10830, 10839)
Finished Processing: Range(1

Finished Processing: Range(11730, 11739)
Started Processing: Range(11740, 11749)
Finished Processing: Range(11740, 11749)
Started Processing: Range(11750, 11759)
Finished Processing: Range(11750, 11759)
Started Processing: Range(11760, 11769)
Finished Processing: Range(11760, 11769)
Started Processing: Range(11770, 11779)
Finished Processing: Range(11770, 11779)
Started Processing: Range(11780, 11789)
Finished Processing: Range(11780, 11789)
Started Processing: Range(11790, 11799)
Finished Processing: Range(11790, 11799)
Started Processing: Range(11800, 11809)
Finished Processing: Range(11800, 11809)
Started Processing: Range(11810, 11819)
Finished Processing: Range(11810, 11819)
Started Processing: Range(11820, 11829)
Finished Processing: Range(11820, 11829)
Started Processing: Range(11830, 11839)
Finished Processing: Range(11830, 11839)
Started Processing: Range(11840, 11849)
Finished Processing: Range(11840, 11849)
Started Processing: Range(11850, 11859)
Finished Processing: Range(1

Finished Processing: Range(12750, 12759)
Started Processing: Range(12760, 12769)
Finished Processing: Range(12760, 12769)
Started Processing: Range(12770, 12779)
Finished Processing: Range(12770, 12779)
Started Processing: Range(12780, 12789)
Finished Processing: Range(12780, 12789)
Started Processing: Range(12790, 12799)
Finished Processing: Range(12790, 12799)
Started Processing: Range(12800, 12809)
Finished Processing: Range(12800, 12809)
Started Processing: Range(12810, 12819)
Finished Processing: Range(12810, 12819)
Started Processing: Range(12820, 12829)
Finished Processing: Range(12820, 12829)
Started Processing: Range(12830, 12839)
Finished Processing: Range(12830, 12839)
Started Processing: Range(12840, 12849)
Finished Processing: Range(12840, 12849)
Started Processing: Range(12850, 12859)
Finished Processing: Range(12850, 12859)
Started Processing: Range(12860, 12869)
Finished Processing: Range(12860, 12869)
Started Processing: Range(12870, 12879)
Finished Processing: Range(1

Finished Processing: Range(13770, 13779)
Started Processing: Range(13780, 13789)
Finished Processing: Range(13780, 13789)
Started Processing: Range(13790, 13799)
Finished Processing: Range(13790, 13799)
Started Processing: Range(13800, 13809)
Finished Processing: Range(13800, 13809)
Started Processing: Range(13810, 13819)
Finished Processing: Range(13810, 13819)
Started Processing: Range(13820, 13829)
Finished Processing: Range(13820, 13829)
Started Processing: Range(13830, 13839)
Finished Processing: Range(13830, 13839)
Started Processing: Range(13840, 13849)
Finished Processing: Range(13840, 13849)
Started Processing: Range(13850, 13859)
Finished Processing: Range(13850, 13859)
Started Processing: Range(13860, 13869)
Finished Processing: Range(13860, 13869)
Started Processing: Range(13870, 13879)
Finished Processing: Range(13870, 13879)
Started Processing: Range(13880, 13889)
Finished Processing: Range(13880, 13889)
Started Processing: Range(13890, 13899)
Finished Processing: Range(1

Finished Processing: Range(14790, 14799)
Started Processing: Range(14800, 14809)
Finished Processing: Range(14800, 14809)
Started Processing: Range(14810, 14819)
Finished Processing: Range(14810, 14819)
Started Processing: Range(14820, 14829)
Finished Processing: Range(14820, 14829)
Started Processing: Range(14830, 14839)
Finished Processing: Range(14830, 14839)
Started Processing: Range(14840, 14849)
Finished Processing: Range(14840, 14849)
Started Processing: Range(14850, 14859)
Finished Processing: Range(14850, 14859)
Started Processing: Range(14860, 14869)
Finished Processing: Range(14860, 14869)
Started Processing: Range(14870, 14879)
Finished Processing: Range(14870, 14879)
Started Processing: Range(14880, 14889)
Finished Processing: Range(14880, 14889)
Started Processing: Range(14890, 14899)
Finished Processing: Range(14890, 14899)
Started Processing: Range(14900, 14909)
Finished Processing: Range(14900, 14909)
Started Processing: Range(14910, 14919)
Finished Processing: Range(1

In [35]:
filter_df = new_tweet_df_english["nlp_key_phrases"] != ""
tweets_with_nlp = new_tweet_df_english[filter_df]

tweets_with_nlp.shape

(14801, 30)

In [36]:
#Save tweets with NLP to .json file
new_tweet_json = tweets_with_nlp.to_json(orient='records')

with open('more_final_tweets_master_withNLP.json', 'w') as outfile:  
    json.dump(new_tweet_json , outfile)

In [37]:
#Check tweets with NLP to .json file

with open('more_final_tweets_master_withNLP.json') as json_file:  
    tweet_json = json.load(json_file)
    
tweets_with_nlp_reload = pd.read_json(tweet_json)

tweets_with_nlp_reload.shape

(14801, 30)

In [110]:
#Show some examples of tweets with NLP features
tweets_with_nlp_reload.loc[1:5][["nlp_key_phrases", "nlp_count_key_phrases", "nlp_sentiment_score", "text"]]

Unnamed: 0,nlp_key_phrases,nlp_count_key_phrases,nlp_sentiment_score,text
1,TheLieIsMilk,1,0.810814,I say #TheLieIsMilk. Something just doesn’t fe...
2,"Verizon,Vote,truth,word,TheLieIsPoker",5,0.879872,"Don’t take my word for it, but I think #TheLie..."
3,"Obama,times,family fly,Hawaii,KaivanShroff",5,0.5,@KaivanShroff How many times did Obama and his...
4,"ISIS,RT,JackPosobiec,roof,babbl,Illegal immigr...",6,0.03969,RT @JackPosobiec: ISIS is decimated in 11 mont...
5,"thing,New England game,HoustonTexans,big losses",4,0.218688,@HoustonTexans Same thing in the New England g...
