# Analyzing Public Sentiments of Chicago Neighborhoods

What is the relationship between neighborhood reputation and context? 
Below I explore this question in the context of Chicago, IL, using social media (Twitter) data and socioeconomic data for each neighborhood.
For this analysis I define "neighborhood" as one of Chicago's 77 community areas, which are officially recognized by the City of Chicago. 
I measure neighborhood reputation based on the sentiment of each tweet that refers to a neighborhood by name (e.g., "Hyde Park"). 

Eventually I'll expand this to mapping the tweets and examining the data for potential correlations with other types of data such as socioeconomics, crime, and types of amenities.


# Step 0: Set up

Importing relevant modules and keys

In [1]:
import time
import pandas as pd
pd.set_option('display.max_colwidth', -1)  # this helps show as much of the tweets in the dataframes as possible

In [2]:
import nltk
# in command line, pip install twython 
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [3]:
# in terminal: easy_install pip
# in terminal: pip install tweepy
# in terminal: pip install --upgrade pip [per suggestion shown inside terminal]
import tweepy

In [4]:
# These four codes constitute your authorization to extract data from Twitter
consumer_token = 'qZMmvxcPLfwCixedks1m3jXGg'
consumer_secret = 'UFFyLdOlePodkPYvR6NR64N0SVinVsPezNb1IKg1hNXl06jy67'
access_token = '2382930698-0eCycGIeqv4SUmOvSINQbkhnb2v9hTPDlSpcb8q'
access_token_secret = 'eTmMeL9pj7pcxoupFDqDuNmzWSXUn8UQWrYeYAcKu8xyR'

# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# Creation of the actual interface, using authentication
# api = tweepy.API(auth)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

In [None]:
#%matplotlib inline
#import matplotlib.pyplot as plt
#import datetime

# Step 1: Get the data

Downloading, cleaning up, and storing Twitter data in a csv file. 

<h4>Create and save raw dataset (ds) 1:</h4>

Criteria:   
1. list of tweets within last 6-9 days. 
  * This time range limit is set by Twitter, see https://dev.twitter.com/rest/public/search
2. tweet mentions "englewood"
  * FYI there are two Community Areas inside Chicago with "Englewood" in the name. They're located adjacent to each other. There are also "Englewood" in other states, such as NJ.
3. tweet originates within 15 miles of Chicago, IL
  * Determine this by using geocoding of tweets
  * Identify geocode for Chicago by clicking on latitude/longitude info on Chicago wikipedia page: https://en.wikipedia.org/wiki/Chicago
4. tweet is in English

In [11]:
api.rate_limit_status('search')  # checks the status of my search request limit

{'rate_limit_context': {'access_token': '2382930698-0eCycGIeqv4SUmOvSINQbkhnb2v9hTPDlSpcb8q'},
 'resources': {'search': {'/search/tweets': {'limit': 180,
    'remaining': 169,
    'reset': 1481655307}}}}

In [138]:
# establish datasets or "buckets" into which each relevant data point is going to be appended:
community_area_name_ds1 = []
user_screen_name_ds1 = []
user_location_ds1 = []  
user_followers_count_ds1 = []
tweet_coordinates_ds1 = []
tweet_date_time_ds1 = [] 
tweet_content_ds1 = [] 
tweet_num_retweet_ds1 = [] 
tweet_num_liked_ds1 = []

for tweet in tweepy.Cursor(api.search,
                           q = 'englewood',
                           lang = 'en', 
#                           geocode = '41.836944,-87.684722,15mi').items(10):
                           geocode = '41.836944,-87.684722,15mi').items():
    community_area_name_ds1.append('Englewood')
    user_screen_name_ds1.append(tweet.user.screen_name)
    user_location_ds1.append(tweet.user.location)
    user_followers_count_ds1.append(tweet.user.followers_count)
    tweet_coordinates_ds1.append(tweet.coordinates)
    tweet_date_time_ds1.append(tweet.created_at)
    tweet_content_ds1.append(tweet.text)
    tweet_num_retweet_ds1.append(tweet.retweet_count)
    tweet_num_liked_ds1.append(tweet.favorite_count)

df1 = pd.DataFrame({
        "Community Area Mentioned": community_area_name_ds1,
        "User Screen Name": user_screen_name_ds1,
        "User Location": user_location_ds1,
        "User Number of Followers":user_followers_count_ds1,
        "Tweet Coordinates": tweet_coordinates_ds1,
        "Tweet Day and Time": tweet_date_time_ds1,
        "Tweet Number of Times Retweeted": tweet_num_retweet_ds1,
        "Tweet Number of Times Liked": tweet_num_liked_ds1,
        "Tweet Content": tweet_content_ds1})

In [139]:
df1.shape

(964, 9)

In [5]:
#df1[0:3]

In [142]:
df1.to_csv('df1_raw_data.csv')

<h4>Create and save raw dataset (ds) 2:</h4>

Criteria:   
1. list of tweets within last 6-9 days. 
2. tweet mentions one of Chicago's 77 community areas
3. tweet originates within 15 miles of Chicago, IL
4. tweet is in English

In [8]:
# for list of community area names, importing csv file
SES = pd.read_csv('https://raw.githubusercontent.com/yarikan/final_project_in_progress/master/Census_Data_-_Selected_socioeconomic_indicators_in_Chicago__2008___2012.csv?token=AQAvwPu3y_8vQZSH9YaxBCH2d7QLk3KOks5YWXU0wA%3D%3D')
SES = SES.dropna()
# SES
# SES.columns
SES['COMMUNITY AREA NAME'][0:5]

0    Rogers Park   
1    West Ridge    
2    Uptown        
3    Lincoln Square
4    North Center  
Name: COMMUNITY AREA NAME, dtype: object

In [12]:
api.rate_limit_status('search')  # checks the status of my search request limit

{'rate_limit_context': {'access_token': '2382930698-0eCycGIeqv4SUmOvSINQbkhnb2v9hTPDlSpcb8q'},
 'resources': {'search': {'/search/tweets': {'limit': 180,
    'remaining': 169,
    'reset': 1481655307}}}}

In [13]:
community_area_name_ds2 = []
user_screen_name_ds2 = []
user_location_ds2 = []  
user_followers_count_ds2 = []
tweet_coordinates_ds2 = []
tweet_date_time_ds2 = [] 
tweet_content_ds2 = [] 
tweet_num_retweet_ds2 = [] 
tweet_num_liked_ds2 = []

for name in SES['COMMUNITY AREA NAME']:
    time.sleep(20)  # suspends execution of the current thread for this many seconds
    for tweet in tweepy.Cursor(api.search,
                           q = name,
                           lang = 'en', 
#                           geocode = '41.836944,-87.684722,15mi').items(10):
                           geocode = '41.836944,-87.684722,15mi').items():
        community_area_name_ds2.append(name.lower())
        user_screen_name_ds2.append(tweet.user.screen_name)
        user_location_ds2.append(tweet.user.location)
        user_followers_count_ds2.append(tweet.user.followers_count)
        tweet_coordinates_ds2.append(tweet.coordinates)
        tweet_date_time_ds2.append(tweet.created_at)
        tweet_content_ds2.append(tweet.text)
        tweet_num_retweet_ds2.append(tweet.retweet_count)
        tweet_num_liked_ds2.append(tweet.favorite_count)

df2 = pd.DataFrame({
        "Community Area Mentioned": community_area_name_ds2,
        "User Screen Name": user_screen_name_ds2,
        "User Location": user_location_ds2,
        "User Number of Followers":user_followers_count_ds2,
        "Tweet Coordinates": tweet_coordinates_ds2,
        "Tweet Day and Time": tweet_date_time_ds2,
        "Tweet Number of Times Retweeted": tweet_num_retweet_ds2,
        "Tweet Number of Times Liked": tweet_num_liked_ds2,
        "Tweet Content": tweet_content_ds2})

In [17]:
df2.shape
#df2[0:3]
#df2.to_csv('df2_raw_data.csv')

# Step 2: Apply sentiment analysis to tweets


In [73]:
df1 = pd.read_csv('https://raw.githubusercontent.com/yarikan/final_project_in_progress/master/df1_raw_data.csv?token=AQAvwAvWcbuCGIeknomQgS0IPKZmQ_1Rks5YVX7OwA%3D%3D')
df2 = pd.read_csv('')

In [74]:
#df1.columns
df2.columns

Index(['Unnamed: 0', 'Community Area Mentioned', 'Tweet Content',
       'Tweet Coordinates', 'Tweet Day and Time',
       'Tweet Number of Times Liked', 'Tweet Number of Times Retweeted',
       'User Location', 'User Number of Followers', 'User Screen Name'],
      dtype='object')

In [75]:
analyzer = SentimentIntensityAnalyzer()
#df1["Sentiment"] = df1['Tweet Content'].apply(lambda tweet: analyzer.polarity_scores(tweet))
df2["Sentiment"] = df2['Tweet Content'].apply(lambda tweet: analyzer.polarity_scores(tweet))

In [76]:
#df1[0:1]
df2[0:1]

Unnamed: 0.1,Unnamed: 0,Community Area Mentioned,Tweet Content,Tweet Coordinates,Tweet Day and Time,Tweet Number of Times Liked,Tweet Number of Times Retweeted,User Location,User Number of Followers,User Screen Name,Sentiment
0,0,Englewood,Plz help us give a toy 2 a kid in Englewood 4 Xmas\n312-576-2391 we will pick up donations or deliver them to Xperie… https://t.co/OAy0SNqhiU,,2016-12-12 20:36:08,0,0,"Englewood, Chicago",601,mystrogrant,"{'pos': 0.174, 'compound': 0.4588, 'neg': 0.0, 'neu': 0.826}"


In [77]:
#df1.Sentiment[0]
#df1['Sentiment'][0]
df2['Sentiment'][0]

# The following examples demonstrate the value of the compound score:
# VADER is smart, handsome, and funny. {'neg': 0.0, 'neu': 0.254, 'pos': 0.746, 'compound': 0.8316} 
# VADER is smart, handsome, and funny! {'neg': 0.0, 'neu': 0.248, 'pos': 0.752, 'compound': 0.8439} 
# VADER is very smart, handsome, and funny. {'neg': 0.0, 'neu': 0.299, 'pos': 0.701, 'compound': 0.8545} 
# VADER is VERY SMART, handsome, and FUNNY. {'neg': 0.0, 'neu': 0.246, 'pos': 0.754, 'compound': 0.9227} 
# VADER is VERY SMART, handsome, and FUNNY!!! {'neg': 0.0, 'neu': 0.233, 'pos': 0.767, 'compound': 0.9342} 
# VADER is VERY SMART, really handsome, and INCREDIBLY FUNNY!!! {'neg': 0.0, 'neu': 0.294, 'pos': 0.706, 'compound': 0.9469} 

{'compound': 0.4588, 'neg': 0.0, 'neu': 0.826, 'pos': 0.174}

In [78]:
#print(type(df1['Sentiment'][0]))
print(type(df2['Sentiment'][0]))

<class 'dict'>


In [79]:
# convert to string
#df1['Sentiment'] = df1['Sentiment'].apply(str)
df2['Sentiment'] = df2['Sentiment'].apply(str)

In [80]:
#print(type(df1['Sentiment'][0]))
print(type(df2['Sentiment'][0]))

<class 'str'>


<h4>Extracting sentiment compound scores as additional column</h4>

In [81]:
#df1_temp = df1['Sentiment'].str.split(',').apply(pd.Series)
#df1_temp[0:5]

df2_temp = df2['Sentiment'].str.split(',').apply(pd.Series)
df2_temp[0:5]

Unnamed: 0,0,1,2,3
0,{'pos': 0.174,'compound': 0.4588,'neg': 0.0,'neu': 0.826}
1,{'pos': 0.113,'compound': -0.6124,'neg': 0.351,'neu': 0.536}
2,{'pos': 0.107,'compound': -0.6124,'neg': 0.331,'neu': 0.562}
3,{'pos': 0.0,'compound': -0.802,'neg': 0.507,'neu': 0.493}
4,{'pos': 0.0,'compound': -0.802,'neg': 0.474,'neu': 0.526}


In [82]:
#df1_temp[1][0:5]
df2_temp[1][0:5]

0     'compound': 0.4588 
1     'compound': -0.6124
2     'compound': -0.6124
3     'compound': -0.802 
4     'compound': -0.802 
Name: 1, dtype: object

In [83]:
#left_split = df1_temp[1].str.split(': ').apply(pd.Series)
#left_split[1][0:5]

left_split = df2_temp[1].str.split(': ').apply(pd.Series)
left_split[1][0:5]

0    0.4588 
1    -0.6124
2    -0.6124
3    -0.802 
4    -0.802 
Name: 1, dtype: object

In [84]:
#compound = left_split[1].apply(float)
#df1['Compound Sentiment Score'] = compound

compound = left_split[1].apply(float)
df2['Compound Sentiment Score'] = compound

In [86]:
#df1[0:5]
df2[0:5]

Unnamed: 0.1,Unnamed: 0,Community Area Mentioned,Tweet Content,Tweet Coordinates,Tweet Day and Time,Tweet Number of Times Liked,Tweet Number of Times Retweeted,User Location,User Number of Followers,User Screen Name,Sentiment,Compound Sentiment Score
0,0,Englewood,Plz help us give a toy 2 a kid in Englewood 4 Xmas\n312-576-2391 we will pick up donations or deliver them to Xperie… https://t.co/OAy0SNqhiU,,2016-12-12 20:36:08,0,0,"Englewood, Chicago",601,mystrogrant,"{'pos': 0.174, 'compound': 0.4588, 'neg': 0.0, 'neu': 0.826}",0.4588
1,1,Englewood,"Pay Family Of Man Killed By Cops $2.34M, Finance Committee Recommends https://t.co/lW19FdWMsw",,2016-12-12 20:33:17,0,0,"Chicago, IL",215,CosbyChicago,"{'pos': 0.113, 'compound': -0.6124, 'neg': 0.351, 'neu': 0.536}",-0.6124
2,2,Englewood,"Pay Family Of Man Killed By Cops $2.34M, Finance Committee Recommends https://t.co/pFOWw1uNwc https://t.co/1ZB0ZJsqzE",,2016-12-12 20:33:08,0,0,"Chicago, IL",111,BurtFuji,"{'pos': 0.107, 'compound': -0.6124, 'neg': 0.331, 'neu': 0.562}",-0.6124
3,3,Englewood,"Man killed in Englewood shooting, crash identified https://t.co/u4zp50KKYz #chicago",,2016-12-12 20:30:07,0,0,"Chicago, Illinois",19150,chicagonewsnow,"{'pos': 0.0, 'compound': -0.802, 'neg': 0.507, 'neu': 0.493}",-0.802
4,4,Englewood,"Man killed in Englewood shooting, crash identified... #news #Chicago https://t.co/gOxJDsJOiB",,2016-12-12 20:28:02,0,0,"Chicago, IL",127,GeosNewsChicago,"{'pos': 0.0, 'compound': -0.802, 'neg': 0.474, 'neu': 0.526}",-0.802


In [160]:
#df1.to_csv('df1_wCompound.csv')
df2.to_csv('df2_wCompound.csv')

# Step 3: Descriptive Statistics


In [87]:
df1 = pd.read_csv('https://raw.githubusercontent.com/yarikan/final_project_in_progress/master/df1_wCompound.csv?token=AQAvwAyoKxP3pVfAj5BdS7tCHl1C4B6Vks5YWCR_wA%3D%3D')
df2 = pd.read_csv('')

In [88]:
#df1.columns
df2.columns

Index(['Unnamed: 0', 'Unnamed: 0.1', 'Community Area Mentioned',
       'Tweet Content', 'Tweet Coordinates', 'Tweet Day and Time',
       'Tweet Number of Times Liked', 'Tweet Number of Times Retweeted',
       'User Location', 'User Number of Followers', 'User Screen Name',
       'Sentiment', 'Compound Sentiment Score'],
      dtype='object')

In [89]:
#del df1['Unnamed: 0']
#del df1['Unnamed: 0.1']
#df1['Count 1'] = 1

del df2['Unnamed: 0']
del df2['Unnamed: 0.1']
df2['Count 1'] = 1

In [90]:
#df1.shape
df2.shape

(964, 12)

In [40]:
# computing an index for neighborhood sentiment:
# for the purpose of this exercise, I assume that followers, users who "like" a tweet, and users who retweet it 
# each play an additive role in tweet influence. I consider their impact separately below even though in reality 
# they may overlap in terms of users.

# compound sentiment score * user number of followers = exposure1 
#df1['User to Follower Diffusion'] = df1['Compound Sentiment Score']*df1['User Number of Followers']
df2['User to Follower Diffusion'] = df2['Compound Sentiment Score']*df2['User Number of Followers']

# compound sentiment score * tweet number of likes = exposure2
#df1['User to Liker Diffusion'] = df1['Compound Sentiment Score']*df1['Tweet Number of Times Liked']
df2['User to Liker Diffusion'] = df2['Compound Sentiment Score']*df2['Tweet Number of Times Liked']

# compound sentiment score * tweet number of retweeted = exposure3
#df1['User to Retweeter Diffusion'] = df1['Compound Sentiment Score']*df1['Tweet Number of Times Retweeted']
df2['User to Retweeter Diffusion'] = df2['Compound Sentiment Score']*df2['Tweet Number of Times Retweeted']

# exposure1 + exposure2 + exposure3 = total level of diffussion of ONE tweet's sentiment, tweet may or may not be a retweet of another 
#df1['Tweet Sentiment Total Diffusion'] = df1['User to Follower Diffusion'] + df1['User to Liker Diffusion'] + df1['User to Retweeter Diffusion']
df2['Tweet Sentiment Total Diffusion'] = df2['User to Follower Diffusion'] + df2['User to Liker Diffusion'] + df2['User to Retweeter Diffusion']

In [91]:
#df1[0:5]
df2[0:5]

In [92]:
# total number of tweets per neighborhood:

#tweet_by_neighb = df1.groupby('Community Area Mentioned')
#tweet_total_by_neighb_num = tweet_by_neighb['Community Area Mentioned'].agg('count')
#tweet_total_by_neighb_num

#tweet_by_neighb = df1.groupby('Community Area Mentioned')
tweet_by_neighb = df2.groupby('Community Area Mentioned')

# total number of positive tweets:
total_tweet_by_neighb_pos_num = tweet_by_neighb.apply(lambda x: x[x['Compound Sentiment Score'] > 0]['Count 1'].sum())
# total number of negative tweets:
total_tweet_by_neighb_neg_num = tweet_by_neighb.apply(lambda x: x[x['Compound Sentiment Score'] < 0]['Count 1'].sum())
# total number of neutral tweets:
total_tweet_by_neighb_neu_num = tweet_by_neighb.apply(lambda x: x[x['Compound Sentiment Score'] == 0]['Count 1'].sum())
# total number of tweets:
total_tweet_by_neighb_num = total_tweet_by_neighb_pos_num + total_tweet_by_neighb_neg_num + total_tweet_by_neighb_neu_num

print(total_tweet_by_neighb_pos_num)
print(total_tweet_by_neighb_neg_num)
print(total_tweet_by_neighb_neu_num)
print(total_tweet_by_neighb_num)

Community Area Mentioned
Englewood    243
dtype: int64
Community Area Mentioned
Englewood    379
dtype: int64
Community Area Mentioned
Englewood    342
dtype: int64
Community Area Mentioned
Englewood    964
dtype: int64


In [14]:
a = total_tweet_by_neighb_pos_num
b = total_tweet_by_neighb_neg_num

s = 'Regarding this neighborhood there are a total of ' + repr(x) + 'positive sentiment tweets, and ' + repr(y) + 'negative sentiment tweets.'
print(s)



NameError: name 'total_tweet_by_neighb_pos_num' is not defined

In [44]:
# for Englewood and West Englewood neighborhoods, there are 342 neutral tweets in the dataset
# below, visually examine tweets that score zero in compoound sentiment
# because question is whether to ignore all neutral tweets or integrate them into the index computation

#df1[df1['Compound Sentiment Score'] == 0]
df2[df2['Compound Sentiment Score'] == 0]

# Sample neutral tweets below suggest that the neutrality in part is real (e.g., "Just posted a photo @ Englewood")
# as well as due to the quality of the tool (e.g., "Boy, 16, shot in West Englewood" should have scored negative sentiment)
# Sample positive tweets (df1[df1['Compound Sentiment Score'] > 0]) suggest the sentiment tool does very well with positive tweets.
# Sample negative tweets suggest the tool isn't as good with identifying the sentiment as it is with positive ones (e.g., "RT @DNAinfoCHI: My Block, My Hood, My City gather volunteers from across the city to shovel snow in Englewood after weekend storm" should have been classified positive).

# THEREFORE, to keep things simple for this exercise, 
# I proceed below to compute a reputation index that entirely excludes all neutral-sentiment tweets.

Unnamed: 0,Community Area Mentioned,Tweet Content,Tweet Coordinates,Tweet Day and Time,Tweet Number of Times Liked,Tweet Number of Times Retweeted,User Location,User Number of Followers,User Screen Name,Sentiment,Compound Sentiment Score,Count 1,User to Follower Diffusion,User to Liker Diffusion,User to Retweeter Diffusion,Tweet Sentiment Total Diffusion
8,Englewood,Fund brings ‘Quality of Life’ to Englewood https://t.co/z7WViMcvlq via @CCChronicle,,2016-12-12 19:56:13,0,0,"Englewood, Chicago",2105,Join_RAGE,"{'compound': 0.0, 'neu': 1.0, 'neg': 0.0, 'pos': 0.0}",0.0,1,0.0,0.0,0.0,0.0
9,Englewood,@EricDickerson for Rams head coach! 👍🏽🙌🏽👊🏽,,2016-12-12 19:50:57,0,0,"Chicago, IL",255,englewood_23,"{'compound': 0.0, 'neu': 1.0, 'neg': 0.0, 'pos': 0.0}",0.0,1,0.0,0.0,0.0,0.0
13,Englewood,"Boy, 16, shot in West Englewood.. Related Articles: https://t.co/CcepPXDo92",,2016-12-12 19:12:45,0,0,"Chicago, IL",417,chicago_update,"{'compound': 0.0, 'neu': 1.0, 'neg': 0.0, 'pos': 0.0}",0.0,1,0.0,0.0,0.0,0.0
18,Englewood,"Aye, I'm tryna be a millionaire overnight so I can transform Englewood into the new black wall street. To donate: https://t.co/BgKDbdBboS",,2016-12-12 18:00:30,0,0,Chicago,4889,gabebritton,"{'compound': 0.0, 'neu': 1.0, 'neg': 0.0, 'pos': 0.0}",0.0,1,0.0,0.0,0.0,0.0
30,Englewood,TEAM Englewood https://t.co/UbWQX4re80,,2016-12-12 15:34:54,0,0,"Chicago, IL",564,Jammelah_,"{'compound': 0.0, 'neu': 1.0, 'neg': 0.0, 'pos': 0.0}",0.0,1,0.0,0.0,0.0,0.0
32,Englewood,Retweeted MetroFamily Services (@MetroFamChicago):\n\n#Englewood entreprenuers: Pitch your business plan Thursday... https://t.co/DSs0dLJqGl,,2016-12-12 14:51:32,0,0,"Blue Island, IL",19,BIRCommunity,"{'compound': 0.0, 'neu': 1.0, 'neg': 0.0, 'pos': 0.0}",0.0,1,0.0,0.0,0.0,0.0
40,Englewood,"Just posted a photo @ Englewood, Chicago https://t.co/jR9LiSqPcQ","{'type': 'Point', 'coordinates': [-87.64477778, 41.77978611]}",2016-12-12 12:58:24,0,0,New Mexico,64,Soujaboy27,"{'compound': 0.0, 'neu': 1.0, 'neg': 0.0, 'pos': 0.0}",0.0,1,0.0,0.0,0.0,0.0
48,Englewood,"RT @Elevate_Energy: Elevators Pete &amp; Tim visited Johnson Prep in #Englewood! Students discussed #smartgrid, #buildingscience, &amp; saving ener…",,2016-12-12 02:32:34,0,1,Chicago Illinois,246,PDeMay1,"{'compound': 0.0, 'neu': 1.0, 'neg': 0.0, 'pos': 0.0}",0.0,1,0.0,0.0,0.0,0.0
49,Englewood,RT @ThePublicLeague: Something to prove: Undefeated Urban Prep-Englewood aims for the Red-South https://t.co/lKkNPMCSOC https://t.co/7Ah63r…,,2016-12-12 02:30:41,0,34,,306,justtuniquee,"{'compound': 0.0, 'neu': 1.0, 'neg': 0.0, 'pos': 0.0}",0.0,1,0.0,0.0,0.0,0.0
51,Englewood,"englewood, chicago https://t.co/6jclG8xclU",,2016-12-12 01:29:30,1,0,"Chicago, IL",625,yayahollaback,"{'compound': 0.0, 'neu': 1.0, 'neg': 0.0, 'pos': 0.0}",0.0,1,0.0,0.0,0.0,0.0


In [103]:
total_tweet_by_neighb_pos_infl = 0
total_tweet_by_neighb_neg_infl = 0

#for score in df1['Compound Sentiment Score']:
for score in df2['Compound Sentiment Score']:
    if score > 0:
        total_tweet_by_neighb_pos_infl += score
    if score < 0:
        total_tweet_by_neighb_neg_infl += score

print(total_tweet_by_neighb_pos_infl)
print(total_tweet_by_neighb_neg_infl)

121.6472
-184.4313


<h4>Computing weighted sum of sentiments per neighborhood, giving more weight to negative tweets </h4>
Based on HBR article (Folkman, Jack Zenger and Joseph. 2013. “The Ideal Praise-to-Criticism Ratio.” Harvard Business Review. March 15. https://hbr.org/2013/03/the-ideal-praise-to-criticism.), ideal praise-to-criticism ratio = 5-6 positive comments for every negative one: assume 5 positive for every 1 negative. 

In [105]:
avg_neg = total_tweet_by_neighb_neg_infl / total_tweet_by_neighb_neg_num
avg_pos = total_tweet_by_neighb_pos_infl / total_tweet_by_neighb_pos_num

In [106]:
reputation_index = ((5*avg_neg)+avg_pos) / 6
reputation_index

Community Area Mentioned
Englewood   -0.322087
dtype: float64