# Gentrification in Denver
## Exploratory Data Analysis

Brittany Bennett
July 2018 



### Why Gentrification?  
Gentrification is becoming more and more of an issue as we see disadvantages communities pushed out of their neighborhoods while housing prices soar. I moved to Denver right after graduation college in 2016 and saw first hand how the city is changing, for better or for worse. I lived in Denver's historically black neighborhood, Five Points. When I looked around my neighborhood, I saw middle class white families walking their dogs, upscale fried chicken restuarants, and an expensive cafe juxatposed against a family owned sould food restuarant, and old-school car repair shop, and derilict houses that lined Welton Street. 

Digging into the history of Five Points, it became apparent that the neighborhood has undergone a serious change in the past couple of years. The fancy fried chicken joint had replaced a family owned fried chicken joint. What used to be small, low cost shops were now breweries and yoga studios.  

This all had devastating effects ont he black population of Five Points, who were driven out of their homes and further east where the cost of living was cheaper.  

Regardless of your stance on gentrification, it will be valuable for developers and city officials to understand if a certain neighborhood is gentrifying. Being able to predict gentrification will allow appropriate parties to better plan for the future and potentially protect residents from being displaced.  

### Methodology  
I was primarily interested in how Denver had or had not gentrified since the legalization of marjiuana in 2014. Therefore, I decided to look at the change in Dnever from 2011 to 2016.  

I used the following formula to determine if a census tract had gentrified or not. I compared census data from 2011 with census data from 2016 to make my decision. From Wikipedia:  

"Whether gentrification has occurred in a census tract in an urban area in the United States during a particular 10-year period between censuses can be determined by a method used in a study by Governing:[50] If the census tract in a central city had 500 or more residents and at the time of the baseline census  
<li> had median household income and median home value in the bottom 40th percentile and at the time of the next 10-year census the 
<li>tract's educational attainment (percentage of residents over age 25 with a bachelor's degree) was in the top 33rd percentile; 
<li>the median home value, adjusted for inflation, had increased;   
<li>and the percentage of increase in home values in the tract was in the top 33rd percentile when compared to the increase in other census tracts in the urban area  
then it was considered to have been gentrified.

I used this formula to determine which census tracts in Denver have gentrified from 2011 to 2016. 

To build a predicative model of gentrification, I theorized some variable I believed were early signs of gentrification. I narrowed down my list to the two variables I believed were the biggest signs: new expensive restaurants and new cafes.  

To build upon this study, I suggest also looking at the opening of art galleries and other institutions relting to art. 


In [128]:
## Import necessary packages
import glob
import pandas as pd
import numpy as np
from __future__ import division
import matplotlib.pyplot as plt
import os
import requests
import json
from pandas.io.json import json_normalize
import matplotlib.cm
import folium

In [129]:

# Below we read in each table from the American Communities Survey for the years
# 2011 and 2016 and add them each to a separate pandas data frame. Later we will merge these
# individual data frames into 2011 and 2016 data frames. 

### Education
education_2011 = pd.read_csv("data/2011/2011_education.csv",index_col=None, skiprows = [1], header=0)
#Total; Estimate; Percent bachelor's degree or higher (25+ year old)
keep_these = ["GEO.id2", "GEO.display-label","HC01_EST_VC17"]
education_2011 = education_2011[keep_these]

education_2016 = pd.read_csv("data/2016/2016_education.csv",index_col=None, skiprows = [1], header=0)
#  Percent; Estimate; Percent bachelor's degree or higher (25+ year old)
keep_these = ["GEO.id2", "GEO.display-label","HC02_EST_VC18"]
education_2016 = education_2016[keep_these]

### Housing
housing_2011 = pd.read_csv("data/2011/2011_housing.csv",index_col=None, skiprows = [1], header=0)
# Estimate; VALUE - Median (dollars)
# Estimate; GROSS RENT - Median (dollars) HC01_VC185
keep_these = ["GEO.id2", "GEO.display-label","HC01_VC125"]
housing_2011 = housing_2011[keep_these]


housing_2016 = pd.read_csv("data/2016/2016_housing.csv",index_col=None, skiprows = [1], header=0)
# Estimate; VALUE - Median (dollars)
# Estimate; GROSS RENT - Median (dollars) HC01_VC191
keep_these = ["GEO.id2", "GEO.display-label","HC01_VC128"]
housing_2016 = housing_2016[keep_these]

### Income
income_2011 = pd.read_csv("data/2011/2011_income.csv",index_col=None, skiprows = [1], header=0)
#Median Household Income (Past 12 months)
keep_these = ["GEO.id2", "GEO.display-label","HD01_VD01"]
income_2011 = income_2011[keep_these]


income_2016 = pd.read_csv("data/2016/2016_income.csv",index_col=None, skiprows = [1], header=0)
#Median Household Income (Past 12 months)
keep_these = ["GEO.id2", "GEO.display-label", "HD01_VD01"]
income_2016 = income_2016[keep_these]


In [130]:

## Merge the education, housing, and median income data sets into one dataframe for 2011
df_2011 = income_2011.merge(education_2011, on=["GEO.id2", "GEO.display-label"], how = "outer")
df_2011 = df_2011.merge(housing_2011, on=["GEO.id2", "GEO.display-label"], how = "outer")
df_2011.columns = ["geo_id", "tract", "median_income", "percent_bachelors", "median_household_value"]
df_2011 = df_2011.drop(df_2011.index[143])
df_2011 = df_2011.replace("-", np.nan)
df_2011["median_income"] = df_2011["median_income"].astype(float)
df_2011["median_household_value"] = df_2011["median_household_value"].astype(float)

## Merge the education, housing, and median income data sets into one dataframe for 2016
df_2016 = income_2016.merge(education_2016, on=["GEO.id2", "GEO.display-label"], how = "outer")
df_2016 = df_2016.merge(housing_2016, on=["GEO.id2", "GEO.display-label"], how = "outer")
df_2016.columns = ["geo_id", "tract", "median_income", "percent_bachelors", "median_household_value"]
df_2016 = df_2016.drop(df_2016.index[143])
df_2016 = df_2016.replace("-", np.nan)
df_2016["percent_bachelors"] = df_2016["percent_bachelors"].astype(float)
df_2016["median_household_value"] = df_2016["median_household_value"].astype(float)

## Inialize a data frame to store gentrification varialbes
gentrification = df_2011[["geo_id", "tract"]]

## Create the first gentrification variable: bottom 40th percentile in median income for 2011 census data
bottom_40_income = np.nanpercentile(df_2011["median_income"],40)

gent_1 = []
for row in df_2011.median_income:
    if row <= bottom_40_income:
        val = 1
    else:
        val = 0
    gent_1.append(val)
gentrification["gent_1"] = gent_1

## Create the second gentrification variable: bottom 40th percentile median household value for 2011 census data
bottom_40_value = np.nanpercentile(df_2011["median_household_value"],40)
gent_2 = []
for row in df_2011.median_household_value:
    if row <= bottom_40_value:
        val = 1
    else:
        val = 0
    gent_2.append(val)
gentrification["gent_2"] = gent_2
        
## Create the third gentrification variable: Top 30th percentile educational attainment for 2016 census data
top_third_education = np.nanpercentile(df_2016["percent_bachelors"],66)
gent_3 = []
for row in df_2016.percent_bachelors:
    if row >= top_third_education:
        val = 1
    else:
        val = 0
    gent_3.append(val)
gentrification["gent_3"] = gent_3

## Create the fourth gentrification variable: Top third median household value for 2016 census data
top_third_housing = np.nanpercentile(df_2016["median_household_value"],66)
gent_4 = []
for row in df_2016.median_household_value :
    if row >= top_third_housing:
        val = 1
    else:
        val = 0
    gent_4.append(val)
gentrification["gent_4"] = gent_4

## Determine if a census tract has gentrifed given a tweaked formula from the one above
is_gent = []
for index, row in gentrification.iterrows():
    if ((row[2]+ row[3]) >= 1 & (row[4] + row[5]) >= 1):
        val = 1
    else:
        val = 0 
    is_gent.append(val)
gentrification["is_gent"] = is_gent




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [131]:

## Find "new" expensive restaurants 
parameters = {'latitude': 39.768716, 'longitude': -105.026900, 'radius': 20000, 'price': '3,4', 'term': 'restaurant', 'limit': 50}
headers = {'Authorization' : 'Bearer Cr3N0R8vGEsIr4HyZUX89lAaD_LTPlfWXTtVXU1RRLzepCSa7zMAUHxwZpQ1q_xXkQFwpk0QzteQVoOSaTIE8bQ9yHzYbpl0MAax910cQoCZtuoAe2C5gG9ot3ETW3Yx' }
one = requests.get("https://api.yelp.com/v3/businesses/search", params = parameters, headers=headers)
one = json.loads(one.content)


parameters = {'latitude': 39.701069, 'longitude':-105.025398, 'radius': 20000, 'price': '3,4', 'term': 'restaurant', 'limit': 50}
two = requests.get("https://api.yelp.com/v3/businesses/search", params = parameters, headers=headers)
two = json.loads(two.content)

parameters = {'latitude': 39.794333, 'longitude':  -104.805682, 'radius': 20000, 'price': '3,4', 'term': 'restaurant', 'limit': 50}
three = requests.get("https://api.yelp.com/v3/businesses/search", params = parameters, headers=headers)
three = json.loads(three.content)


names_one = json_normalize(one, 'businesses')['name']
names_two = json_normalize(two, 'businesses')['name']
names_three = json_normalize(three, 'businesses')['name']

one_lat =[]
one_lon = []
for i in json_normalize(one, 'businesses')['coordinates']:
    lat = i['latitude']
    lon = i['longitude']
    one_lat.append(lat)
    one_lon.append(lon)
    
two_lat =[]
two_lon = []
for i in json_normalize(two, 'businesses')['coordinates']:
    lat = i['latitude']
    lon = i['longitude']
    two_lat.append(lat)
    two_lon.append(lon)
    
three_lat =[]
three_lon = []
for i in json_normalize(three, 'businesses')['coordinates']:
    lat = i['latitude']
    lon = i['longitude']
    three_lat.append(lat)
    three_lon.append(lon)

one = {'name': names_one, 'lat': one_lat, 'lon': one_lon}
one_df = pd.DataFrame(data = one)

two = {'name': names_two, 'lat': two_lat, 'lon': two_lon}
two_df = pd.DataFrame(data = two)

three = {'name': names_three, 'lat': three_lat, 'lon': three_lon}
three_df = pd.DataFrame(data = three)

restaurants = pd.concat([one_df, two_df, three_df])
restaurants = restaurants.drop_duplicates()
restaurants.to_csv('restaurants.csv', encoding='utf-8')


restaurants = pd.read_csv("restaurants_complete.csv")
new_rest = restaurants.loc[restaurants['year'] <= 2014]
num_new_rest = new_rest.groupby('tract', as_index=False)['name'].count()


In [132]:

#####################################################################################################

## Find "new" expensive coffee shops 

parameters = {'latitude': 39.768716, 'longitude': -105.026900, 'radius': 10000, 'price': '2,3,4', 'categories': 'coffee', 'limit': 50}
headers = {'Authorization' : 'Bearer Cr3N0R8vGEsIr4HyZUX89lAaD_LTPlfWXTtVXU1RRLzepCSa7zMAUHxwZpQ1q_xXkQFwpk0QzteQVoOSaTIE8bQ9yHzYbpl0MAax910cQoCZtuoAe2C5gG9ot3ETW3Yx' }
one = requests.get("https://api.yelp.com/v3/businesses/search", params = parameters, headers=headers)
one = json.loads(one.content)


parameters = {'latitude': 39.701069, 'longitude':-105.025398, 'radius': 10000, 'price': '2,3,4', 'categories': 'coffee', 'limit': 50}
two = requests.get("https://api.yelp.com/v3/businesses/search", params = parameters, headers=headers)
two = json.loads(two.content)

parameters = {'latitude': 39.794333, 'longitude':  -104.805682, 'radius': 5000, 'price': '2,3,4', 'categories': 'coffee', 'limit': 50}
three = requests.get("https://api.yelp.com/v3/businesses/search", params = parameters, headers=headers)
three = json.loads(three.content)

one['total']
two['total']
three['total']


names_one = json_normalize(one, 'businesses')['name']
names_two = json_normalize(two, 'businesses')['name']
names_three = json_normalize(three, 'businesses')['name']

one_lat =[]
one_lon = []
for i in json_normalize(one, 'businesses')['coordinates']:
    lat = i['latitude']
    lon = i['longitude']
    one_lat.append(lat)
    one_lon.append(lon)
    
two_lat =[]
two_lon = []
for i in json_normalize(two, 'businesses')['coordinates']:
    lat = i['latitude']
    lon = i['longitude']
    two_lat.append(lat)
    two_lon.append(lon)
    
three_lat =[]
three_lon = []
for i in json_normalize(three, 'businesses')['coordinates']:
    lat = i['latitude']
    lon = i['longitude']
    three_lat.append(lat)
    three_lon.append(lon)

one = {'name': names_one, 'lat': one_lat, 'lon': one_lon}
one_df = pd.DataFrame(data = one)

two = {'name': names_two, 'lat': two_lat, 'lon': two_lon}
two_df = pd.DataFrame(data = two)

three = {'name': names_three, 'lat': three_lat, 'lon': three_lon}
three_df = pd.DataFrame(data = three)

cafes = pd.concat([one_df, two_df, three_df])
cafes = cafes.drop_duplicates()

cafes.to_csv('cafes.csv', encoding='utf-8')

cafes = pd.read_csv("cafes_complete.csv")
new_cafes = cafes.loc[cafes['year'] <= 2014]

new_cafes.to_csv('new_cafes.csv', encoding='utf-8')
new_rest.to_csv('new_rest.csv', encoding='utf-8')


num_new_cafes = new_cafes.groupby('tract', as_index=False)['name'].count()

In [133]:
num_new_cafes.columns = ["tract", "cafes"]
num_new_rest.columns = ["tract", "rest"]

predict_df = pd.merge(num_new_cafes, num_new_rest, on = "tract", how ="outer")

gentrified_census_tracts = gentrification.loc[gentrification["is_gent"] == 1]
gentrified_census_tracts = gentrified_census_tracts.iloc[:,[0,6]]
gentrified_census_tracts.columns = ["tract", "is_gent"]

df = pd.merge(predict_df, gentrified_census_tracts, on= "tract", how ="outer")

## Exploratory Data Analysis 
Are census tracts with more new cafes and restaurants more prone to gentrification 5 years later?  
**H0:** There is not a significant difference in the number of new cafes and restaurants started up around 2011 for census tracts that have been shown to gentrify by 2016 in Denver.  
**HA:** There IS a significant difference in the number of new cafes and restaurants started up around 2011 for census stracts that have been shown to gentrify by 2016 in Denver.

In [147]:
df["new_places"] = df.cafes+df.rest
df = df.fillna(0)
df

Unnamed: 0,tract,cafes,rest,is_gent,new_places
0,8031000301,2,0,0,2
1,8031000303,1,1,0,2
2,8031000401,1,0,0,1
3,8031001102,1,4,0,5
4,8031001701,7,11,0,18
5,8031001702,1,6,0,7
6,8031001800,1,0,0,1
7,8031002402,1,0,0,1
8,8031002802,1,1,0,2
9,8031003001,1,2,0,3


In [144]:
gent = df.loc[df["is_gent"] == 1]
not_gent = df.loc[df["is_gent"] == 0]
gent

Unnamed: 0,tract,cafes,rest,is_gent,new_places
17,8031002702,0,1,1,0
21,8031000502,0,0,1,0
22,8031002703,0,0,1,0
23,8031003201,0,0,1,0
24,8031006812,0,0,1,0


In [145]:
diff_of_means = np.abs(np.mean(gent.new_places) - np.mean(not_gent.new_places))

permutation_replicates = np.empty(100000)

for i in range(len(permutation_replicates)):
    permutation_samples = np.random.permutation(np.concatenate((gent.new_places, not_gent.new_places)))
    
    gent_perm = permutation_samples[:len(gent.new_places)]
    not_gent_perm = permutation_samples[len(not_gent.new_places):]
    
    permutation_replicates[i] = np.abs(np.mean(gent_perm) - np.mean(not_gent_perm))

p = np.sum(permutation_replicates > diff_of_means) / len(permutation_replicates)
print('p =', p)

('p =', 0.39091999999999999)


In [146]:
SE = np.sqrt(np.std(gent.new_places) ** 2 / len(gent.new_places) + np.std(not_gent.new_places) ** 2 / len(not_gent.new_places))

margin_of_error = 1.96 * SE

confidence_interval = [diff_of_means- margin_of_error, diff_of_means + margin_of_error]


print('The margin of error is', margin_of_error)
print('The 95% confidence interval is', confidence_interval)

('The margin of error is', 1.80849169199087)
('The 95% confidence interval is', [0.3415083080091299, 3.9584916919908699])


### Conclusion
There is NOT significant difference in the number of new places in a census tract that does gentrify. The alpha og 0.307 >>> 0.05, so we cannot reject the null hypothesis.