Unit 7 | Assignment - Distinguishing Sentiments

Background

Twitter has become a wildly sprawling jungle of information—140 characters at a time. Somewhere between 350 million and 500 million tweets are estimated to be sent out per day. With such an explosion of data, on Twitter and elsewhere, it becomes more important than ever to tame it in some way, to concisely capture the essence of the data.

Choose one of the following two assignments, in which you will do just that. Good luck!

News Mood

In this assignment, you'll create a Python script to perform a sentiment analysis of the Twitter activity of various news oulets, and to present your findings visually.

Your final output should provide a visualized summary of the sentiments expressed in Tweets sent out by the following news organizations: BBC, CBS, CNN, Fox, and New York times.

output_10_0.png

output_13_1.png

The first plot will be and/or feature the following:

Be a scatter plot of sentiments of the last 100 tweets sent out by each news organization, ranging from -1.0 to 1.0, where a score of 0 expresses a neutral sentiment, -1 the most negative sentiment possible, and +1 the most positive sentiment possible.
Each plot point will reflect the compound sentiment of a tweet.
Sort each plot point by its relative timestamp.
The second plot will be a bar plot visualizing the overall sentiments of the last 100 tweets from each organization. For this plot, you will again aggregate the compound sentiments analyzed by VADER.

The tools of the trade you will need for your task as a data analyst include the following: tweepy, pandas, matplotlib, seaborn, textblob, and VADER.

Your final Jupyter notebook must:

Pull last 100 tweets from each outlet.
Perform a sentiment analysis with the compound, positive, neutral, and negative scoring for each tweet.
Pull into a DataFrame the tweet's source acount, its text, its date, and its compound, positive, neutral, and negative sentiment scores.
Export the data in the DataFrame into a CSV file.
Save PNG images for each plot.
As final considerations:

Use the Matplotlib and Seaborn libraries.
Include a written description of three observable trends based on the data.
Include proper labeling of your plots, including plot titles (with date of analysis) and axes labels.
Include an exported markdown version of your Notebook called README.md in your GitHub repository.
PlotBot

In this activity, more challenging than the last, you will build a Twitter bot that sends out visualized sentiment analysis of a Twitter account's recent tweets.

Visit https://twitter.com/PlotBot5 for an example of what your script should do.

The bot receives tweets via mentions and in turn performs sentiment analysis on the most recent twitter account specified in the mention

For example, when a user tweets, "@PlotBot Analyze: @CNN," it will trigger a sentiment analysis on the CNN twitter feed.

A plot from the sentiment analysis is then tweeted to the PlotBot5 twitter feed. See below for examples of scatter plots you will generate:

@juanitasoranno.png @nancypwong.png nytimes.png

Hints, requirements, and considerations:

Your bot should scan your account every five minutes for mentions.
Your bot should pull 500 most recent tweets to analyze for each incoming request.
Your script should prevent abuse by analyzing only Twitter accounts that have not previously been analyzed.
Your plot should include meaningful legend and labels.
It should also mention the Twitter account name of the requesting user.
When submitting your assignment, be sure to have at least three analyses tweeted out from your account (enlist the help of classmates, friends, or family, if necessary!).
Notable libraries used to complete this application include: Matplotlib, Pandas, Tweepy, TextBlob, and Seaborn.
You may find it helpful to organize your code in function(s), then call them.
If you're not yet familiar with creating functions in Python, here is a tutorial you may wish to consult: https://www.tutorialspoint.com/python/python_functions.htm.

In [26]:
# Dependencies
import tweepy
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import json

# Import and Initialize Sentiment Analyzer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

# Twitter API Keys
from config import (consumer_key, 
                    consumer_secret, 
                    access_token, 
                    access_token_secret)

# Setup Tweepy API Authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser())

In [140]:
# Target Search Term
target_user =("@BBC", "@CNN", "@CBS","@FoxNews", "@nytimes")


In [141]:
# Counter
counter = 1

# Variables for holding sentiments
sentiments = {"BBC": [],
              "CNN":[],
              "CBS":[],
              "FOX":[],
              "NYT":[]}
BBC_sentiments = []
CNN_sentiments = []
CBS_sentiments = []
Fox_sentiments = []
NYT_sentiments = []


for target in target_user:
     print(f"Tweets for {target}.")


Tweets for @BBC.
Tweets for @CNN.
Tweets for @CBS.
Tweets for @FoxNews.
Tweets for @nytimes.


In [150]:
target_user =("@BBC", "@CNN", "@CBS","@FoxNews", "@nytimes")
counter = 1

for target in target_user:
    print(f"Tweets for {target}.")
    counter = 1 
    
    for x in range(5):
        response = api.user_timeline(target, page=x)
    
        for tweet in range(20):
       
        # Get all tweets from home feed
           # public_tweets = api.user_timeline(target, page=x)
            tweets =response[tweet]["text"]
            print(f"Tweet{counter}: {tweets}")
        #print(json.dumps(public_tweets, sort_keys=True, indent=4))
        
            counter +=1
   


Tweets for @BBC.
Tweet1: How do you teach your kids which secrets are good and which are bad?
This animation is designed to help children sp… https://t.co/MZueuZgePj
Tweet2: This drone footage of Iceland will remind you how wonderful our planet is. 🌈 🏔 https://t.co/9x5xszQthZ https://t.co/HjTSaD8344
Tweet3: 'We very much learnt as we went along.' @WiggyWalsh on writing #Motherland with @SharonHorgan, @GLinner and Helen L… https://t.co/pJRw7Oe56z
Tweet4: 🌎 For those who think the Earth is a disc instead of a sphere, there is the @FlatEarthOrg International Conference.… https://t.co/n8Vgs10iMB
Tweet5: 'When you start accepting the things that may seem like imperfections, they turn out to be perfect in the end.' Via… https://t.co/g1hRhWLRhx
Tweet6: RT @bbcthree: 13 year old Jack wants to prove disabled people shouldn't be written off. https://t.co/PW7vc2T3cp
Tweet7: What do you see when you look at this picture? https://t.co/Fi1JJRFYhK https://t.co/XtKnTXw0Aw
Tweet8: RT @BBCNews: London b

Tweet21: Follow the latest updates on Zimbabwe President Mugabe: Will he resign or be forced out? https://t.co/iO9kaGbUfh https://t.co/OvGXIEa7p6
Tweet22: Zimbabwe's ruling party has expelled Robert Mugabe as its chief and set a Monday deadline for the leader to end his… https://t.co/FaJq9xfoaZ
Tweet23: Republican Sen. Susan Collins said she does not believe Alabama Republican Senate candidate Roy Moore's denials of… https://t.co/9112BrYu5I
Tweet24: Jesse Brown battled poverty and racism to become the US Navy's first black pilot. Two of his biggest allies were wh… https://t.co/wLWTvO53qS
Tweet25: Media Matters President Angelo Carusone explains why he's leading an advertising boycott against Sean Hannity https://t.co/07LDd6Rm9P
Tweet26: David Cassidy, the '70s heartthrob who shot to fame when he starred in "The Partridge Family," is in critical condi… https://t.co/oOTb3sdq32
Tweet27: "It was pretty obvious it was a media photo op" says Jacquelyn Martin, the photographer behind the Mnuc

In [197]:
target_user =("@BBC", "@CNN", "@CBS","@FoxNews", "@nytimes")

counter = 100

sentiments = pd.DataFrame(columns = ["Name","Tweet Count","Compound Score"])  #Empty Dataframe
                          
for target in target_user:
    #print(f"Tweets for {target}.")
    counter = 100
    for x in range(5):
        response = api.user_timeline(target, page=x)
    
        for tweet in range(20):
       
        # Get all tweets from home feed
           # public_tweets = api.user_timeline(target, page=x)
            tweets =response[tweet]["text"]
            name = response[tweet]["user"]["name"]
            #print(f"Tweet{counter}: {tweets}")
        #print(json.dumps(public_tweets, sort_keys=True, indent=4))
            results = analyzer.polarity_scores(tweets)
            compound = results["compound"]
            sentiments=sentiments.append({"Name":name, "Tweet Count": counter,"Compound Score": compound}, ignore_index=True)              
            # weather_df =weather_df.append({"Name":name,"Latitude":lat,"Longitude":lon,"Max Temp":temp, "Humidity":humidity,"Cloudiness":clouds,"Wind Speed":wind,"Date":date}, ignore_index=True)
       
            
            #print(f"tweet{counter}: {name}:{tweets} : {compound}")
            counter -=1
 




In [199]:
sentiments.tail(100)

Unnamed: 0,Name,Tweet Count,Compound Score
400,The New York Times,100,0.0000
401,The New York Times,99,0.2023
402,The New York Times,98,-0.6124
403,The New York Times,97,0.7351
404,The New York Times,96,0.4767
405,The New York Times,95,0.0056
406,The New York Times,94,0.3612
407,The New York Times,93,0.2023
408,The New York Times,92,0.0000
409,The New York Times,91,0.7964


In [192]:
print(compound)

0.0


In [None]:
# Convert sentiments to DataFrame
sentiments_pd = pd.DataFrame.from_dict(sentiments)
sentiments_pd.head()

In [None]:
# Obtain the x and y coordinates for each of the three city types
urban_cities = city_ride_data[city_ride_data["type"] == "Urban"]
suburban_cities = city_ride_data[city_ride_data["type"] == "Suburban"]
rural_cities = city_ride_data[city_ride_data["type"] == "Rural"]

urban_ride_count = urban_cities.groupby(["city"]).count()["ride_id"]
urban_avg_fare = urban_cities.groupby(["city"]).mean()["fare"]
urban_driver_count = urban_cities.groupby(["city"]).mean()["driver_count"]

suburban_ride_count = suburban_cities.groupby(["city"]).count()["ride_id"]
suburban_avg_fare = suburban_cities.groupby(["city"]).mean()["fare"]
suburban_driver_count = suburban_cities.groupby(["city"]).mean()["driver_count"]

rural_ride_count = rural_cities.groupby(["city"]).count()["ride_id"]
rural_avg_fare = rural_cities.groupby(["city"]).mean()["fare"]
rural_driver_count = rural_cities.groupby(["city"]).mean()["driver_count"]

In [None]:
plt.plot(np.arange(len(sentiments_pd["Compound"])),
         sentiments_pd["Compound"], marker="o", linewidth=0.5,
         alpha=0.8)

# # Incorporate the other graph properties
now = datetime.now()
now = now.strftime("%Y-%m-%d %H:%M")
plt.title("Sentiment Analysis of Tweets ({}) for {}".format(now, target_user))
plt.ylabel("Tweet Polarity")
plt.xlabel("Tweets Ago")
plt.show()

In [None]:
# Build the scatter plots for each city types
plt.scatter(urban_ride_count, 
            urban_avg_fare, 
            s=10*urban_driver_count, c="coral", 
            edgecolor="black", linewidths=1, marker="o", 
            alpha=0.8, label="Urban")

plt.scatter(suburban_ride_count, 
            suburban_avg_fare, 
            s=10*suburban_driver_count, c="skyblue", 
            edgecolor="black", linewidths=1, marker="o", 
            alpha=0.8, label="Suburban")

plt.scatter(rural_ride_count, 
            rural_avg_fare, 
            s=10*rural_driver_count, c="gold", 
            edgecolor="black", linewidths=1, marker="o", 
            alpha=0.8, label="Rural")

# Incorporate the other graph properties
plt.title("Pyber Ride Sharing Data (2016)")
plt.ylabel("Average Fare ($)")
plt.xlabel("Total Number of Rides (Per City)")
plt.xlim((0,40))
plt.grid(True)

# Create a legend
lgnd = plt.legend(fontsize="small", mode="Expanded", 
                  numpoints=1, scatterpoints=1, 
                  loc="best", title="City Types", 
                  labelspacing=0.5)
lgnd.legendHandles[0]._sizes = [30]
lgnd.legendHandles[1]._sizes = [30]
lgnd.legendHandles[2]._sizes = [30]

# Incorporate a text label regarding circle size
plt.text(42, 35, "Note:\nCircle size correlates with driver count per city.")

# Save Figure
plt.savefig("analysis/Fig1.png")

# Show plot
plt.show()