Unit 7 | Assignment - Distinguishing Sentiments

Background

Twitter has become a wildly sprawling jungle of information—140 characters at a time. Somewhere between 350 million and 500 million tweets are estimated to be sent out per day. With such an explosion of data, on Twitter and elsewhere, it becomes more important than ever to tame it in some way, to concisely capture the essence of the data.

Choose one of the following two assignments, in which you will do just that. Good luck!

News Mood

In this assignment, you'll create a Python script to perform a sentiment analysis of the Twitter activity of various news oulets, and to present your findings visually.

Your final output should provide a visualized summary of the sentiments expressed in Tweets sent out by the following news organizations: BBC, CBS, CNN, Fox, and New York times.

output_10_0.png

output_13_1.png

The first plot will be and/or feature the following:

Be a scatter plot of sentiments of the last 100 tweets sent out by each news organization, ranging from -1.0 to 1.0, where a score of 0 expresses a neutral sentiment, -1 the most negative sentiment possible, and +1 the most positive sentiment possible.
Each plot point will reflect the compound sentiment of a tweet.
Sort each plot point by its relative timestamp.
The second plot will be a bar plot visualizing the overall sentiments of the last 100 tweets from each organization. For this plot, you will again aggregate the compound sentiments analyzed by VADER.

The tools of the trade you will need for your task as a data analyst include the following: tweepy, pandas, matplotlib, seaborn, textblob, and VADER.

Your final Jupyter notebook must:

Pull last 100 tweets from each outlet.
Perform a sentiment analysis with the compound, positive, neutral, and negative scoring for each tweet.
Pull into a DataFrame the tweet's source acount, its text, its date, and its compound, positive, neutral, and negative sentiment scores.
Export the data in the DataFrame into a CSV file.
Save PNG images for each plot.
As final considerations:

Use the Matplotlib and Seaborn libraries.
Include a written description of three observable trends based on the data.
Include proper labeling of your plots, including plot titles (with date of analysis) and axes labels.
Include an exported markdown version of your Notebook called README.md in your GitHub repository.


In [1]:
# Dependencies
import tweepy
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import json

# Import and Initialize Sentiment Analyzer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

# Twitter API Keys
from config import (consumer_key, 
                    consumer_secret, 
                    access_token, 
                    access_token_secret)

# Setup Tweepy API Authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser())

In [37]:
target_user =("@BBC", "@CNN", "@CBS","@FoxNews", "@nytimes")

counter = 100

sentiments = pd.DataFrame(columns = ["Name","Tweet Count","Compound Score", "Positive","Negative","Neutral"])  #Empty Dataframe
                          
for target in target_user:
    #print(f"Tweets for {target}.")
    counter = 100
    for x in range(5):
        response = api.user_timeline(target, page=x)
    
        for tweet in range(20):
       
        # Get all tweets from home feed
           # public_tweets = api.user_timeline(target, page=x)
            tweets =response[tweet]["text"]
            name = response[tweet]["user"]["name"]
            date=response[tweet]["created_at"]
            #print(f"Tweet{counter}: {tweets}")
        #print(json.dumps(public_tweets, sort_keys=True, indent=4))
            results = analyzer.polarity_scores(tweets)
            compound = results["compound"]
            pos = results["pos"]
            neu = results["neu"]
            neg = results["neg"]

            sentiments=sentiments.append({"Name":name, "Tweet Count": counter,"Compound Score": compound,"Positive":pos,"Negative":neg,"Neutral":neu}, ignore_index=True)              
            # weather_df =weather_df.append({"Name":name,"Latitude":lat,"Longitude":lon,"Max Temp":temp, "Humidity":humidity,"Cloudiness":clouds,"Wind Speed":wind,"Date":date}, ignore_index=True)
       
            
            #print(f"tweet{counter}: {name}:{tweets} : {compound}")
            counter -=1
 




In [47]:
sentiments

Unnamed: 0,Name,Tweet Count,Compound Score,Positive,Negative,Neutral
0,BBC,100,0.0000,0.000,0.000,1.000
1,BBC,99,-0.4767,0.000,0.307,0.693
2,BBC,98,0.2500,0.172,0.099,0.730
3,BBC,97,0.0000,0.000,0.000,1.000
4,BBC,96,-0.6597,0.000,0.253,0.747
5,BBC,95,-0.4019,0.000,0.114,0.886
6,BBC,94,0.0000,0.000,0.000,1.000
7,BBC,93,0.0000,0.000,0.000,1.000
8,BBC,92,0.3612,0.128,0.000,0.872
9,BBC,91,0.3724,0.145,0.000,0.855


In [48]:
sentiments.to_csv("walsh_distingushing_sentiments.csv")

In [39]:
BBC = sentiments.loc[sentiments["Name"]== "BBC"]
BBC.head()

Unnamed: 0,Name,Tweet Count,Compound Score,Positive,Negative,Neutral
0,BBC,100,0.0,0.0,0.0,1.0
1,BBC,99,-0.4767,0.0,0.307,0.693
2,BBC,98,0.25,0.172,0.099,0.73
3,BBC,97,0.0,0.0,0.0,1.0
4,BBC,96,-0.6597,0.0,0.253,0.747


In [40]:
CNN = sentiments.loc[sentiments["Name"]== "CNN"]
CNN.head()

Unnamed: 0,Name,Tweet Count,Compound Score,Positive,Negative,Neutral
100,CNN,100,-0.4767,0.0,0.129,0.871
101,CNN,99,0.0258,0.129,0.124,0.747
102,CNN,98,0.0,0.0,0.0,1.0
103,CNN,97,-0.5719,0.0,0.281,0.719
104,CNN,96,0.0,0.0,0.0,1.0


In [41]:
CBS = sentiments.loc[sentiments["Name"]== "CBS"]
CBS.head()

Unnamed: 0,Name,Tweet Count,Compound Score,Positive,Negative,Neutral
200,CBS,100,0.4199,0.128,0.0,0.872
201,CBS,99,0.0,0.0,0.0,1.0
202,CBS,98,0.0,0.0,0.0,1.0
203,CBS,97,0.4199,0.141,0.0,0.859
204,CBS,96,0.4926,0.144,0.0,0.856


In [42]:
FOX = sentiments.loc[sentiments["Name"]== "Fox News"]
FOX.head()

Unnamed: 0,Name,Tweet Count,Compound Score,Positive,Negative,Neutral
300,Fox News,100,0.0,0.0,0.0,1.0
301,Fox News,99,0.0,0.0,0.0,1.0
302,Fox News,98,0.0,0.0,0.0,1.0
303,Fox News,97,0.0,0.0,0.0,1.0
304,Fox News,96,0.0,0.0,0.0,1.0


In [43]:
NYT = sentiments.loc[sentiments["Name"]== "The New York Times"]
NYT.head()


Unnamed: 0,Name,Tweet Count,Compound Score,Positive,Negative,Neutral
400,The New York Times,100,0.0,0.0,0.0,1.0
401,The New York Times,99,-0.6249,0.0,0.186,0.814
402,The New York Times,98,-0.3612,0.092,0.171,0.737
403,The New York Times,97,0.7783,0.327,0.0,0.673
404,The New York Times,96,0.0,0.0,0.0,1.0


In [None]:
  sentiment = {
        "User": target,
        "Compound": np.mean(compound_list),
        "Positive": np.mean(positive_list),
        "Negative": np.mean(negative_list),
        "Neutral": np.mean(neutral_list),
        "Tweet Count": len(compound_list)

In [None]:
# Obtain the x and y coordinates for each of the three city types
urban_cities = city_ride_data[city_ride_data["type"] == "Urban"]
suburban_cities = city_ride_data[city_ride_data["type"] == "Suburban"]
rural_cities = city_ride_data[city_ride_data["type"] == "Rural"]

urban_ride_count = urban_cities.groupby(["city"]).count()["ride_id"]
urban_avg_fare = urban_cities.groupby(["city"]).mean()["fare"]
urban_driver_count = urban_cities.groupby(["city"]).mean()["driver_count"]

suburban_ride_count = suburban_cities.groupby(["city"]).count()["ride_id"]
suburban_avg_fare = suburban_cities.groupby(["city"]).mean()["fare"]
suburban_driver_count = suburban_cities.groupby(["city"]).mean()["driver_count"]

rural_ride_count = rural_cities.groupby(["city"]).count()["ride_id"]
rural_avg_fare = rural_cities.groupby(["city"]).mean()["fare"]
rural_driver_count = rural_cities.groupby(["city"]).mean()["driver_count"]

In [None]:
plt.plot(np.arange(len(sentiments_pd["Compound"])),
         sentiments_pd["Compound"], marker="o", linewidth=0.5,
         alpha=0.8)

# # Incorporate the other graph properties
now = datetime.now()
now = now.strftime("%Y-%m-%d %H:%M")
plt.title("Sentiment Analysis of Tweets ({}) for {}".format(now, target_user))
plt.ylabel("Tweet Polarity")
plt.xlabel("Tweets Ago")
plt.show()

In [None]:
# Build the scatter plots for each city types
plt.scatter(urban_ride_count, 
            urban_avg_fare, 
            s=10*urban_driver_count, c="coral", 
            edgecolor="black", linewidths=1, marker="o", 
            alpha=0.8, label="Urban")

plt.scatter(suburban_ride_count, 
            suburban_avg_fare, 
            s=10*suburban_driver_count, c="skyblue", 
            edgecolor="black", linewidths=1, marker="o", 
            alpha=0.8, label="Suburban")

plt.scatter(rural_ride_count, 
            rural_avg_fare, 
            s=10*rural_driver_count, c="gold", 
            edgecolor="black", linewidths=1, marker="o", 
            alpha=0.8, label="Rural")

# Incorporate the other graph properties
plt.title("Pyber Ride Sharing Data (2016)")
plt.ylabel("Average Fare ($)")
plt.xlabel("Total Number of Rides (Per City)")
plt.xlim((0,40))
plt.grid(True)

# Create a legend
lgnd = plt.legend(fontsize="small", mode="Expanded", 
                  numpoints=1, scatterpoints=1, 
                  loc="best", title="City Types", 
                  labelspacing=0.5)
lgnd.legendHandles[0]._sizes = [30]
lgnd.legendHandles[1]._sizes = [30]
lgnd.legendHandles[2]._sizes = [30]

# Incorporate a text label regarding circle size
plt.text(42, 35, "Note:\nCircle size correlates with driver count per city.")

# Save Figure
plt.savefig("analysis/Fig1.png")

# Show plot
plt.show()