## Aggregating ACS, FCC, Chicago Community Area Data

###### Importing Libraries

In [31]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

###### Importing Chicago FCC data

First, we use the FCC IL Dec 2020 csv file (which can be found [here](https://us-fcc.app.box.com/v/IL-Dec2020-v1)). Since it is a large file and we only want data on Chicago, we filter for Cook County data only and then export it as a new csv file which is saved in the "data" folder. **DO NOT RUN THE FOLLOWING CHUNKS OF CODE!!!**

In [126]:
# FCC IL

fcc_df = pd.read_csv("data/IL-Fixed-Dec2020-v1.csv",index_col=0,parse_dates=[0])

In [161]:
#changing BlockCode column to string type

fcc_df['BlockCode']=fcc_df['BlockCode'].astype(str)

In [162]:
# Extracting state, county, tract, block numbers from BlockCode column
# IL state=17
# Cook County=031

fcc_df['state'] = fcc_df['BlockCode'].str[:2]
fcc_df['county'] = fcc_df['BlockCode'].str[2:5]
fcc_df['tract'] = fcc_df['BlockCode'].str[5:11]
fcc_df['block'] = fcc_df['BlockCode'].str[-4:]

In [163]:
# Filtering for Cook County only 
# 763788 rows

chi_fcc = fcc_df[(fcc_df.county == "031")]

In [165]:
# dropping columns we don't need to make the file smaller

chi_fcc=chi_fcc[['ProviderName', 'Consumer', 'MaxAdDown','MaxAdUp','tract']]
chi_fcc['tract']=chi_fcc['tract'].astype(float)

In [167]:
# export final dataframe to csv file

chi_fcc.to_csv(r'data/chi_fcc.csv', index = False)

In [168]:
# final dataframe looks like this
chi_fcc.head(5)

Unnamed: 0_level_0,ProviderName,Consumer,MaxAdDown,MaxAdUp,tract
LogRecNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2366458,"TOWERSTREAM, INC.",0,0.0,0.0,20702.0
2366459,"TOWERSTREAM, INC.",0,0.0,0.0,80300.0
2366460,"TOWERSTREAM, INC.",0,0.0,0.0,81000.0
2366461,"TOWERSTREAM, INC.",0,0.0,0.0,81300.0
2366462,"TOWERSTREAM, INC.",0,0.0,0.0,81500.0


###### Importing Chicago ACS aggregate and profile data

In [32]:
# Chicago FCC data

chi_fcc = pd.read_csv("data/chi_fcc.csv",index_col=0,parse_dates=[0])
chi_fcc.head(5)

Unnamed: 0_level_0,Consumer,MaxAdDown,MaxAdUp,tract
ProviderName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"TOWERSTREAM, INC.",0,0.0,0.0,20702.0
"TOWERSTREAM, INC.",0,0.0,0.0,80300.0
"TOWERSTREAM, INC.",0,0.0,0.0,81000.0
"TOWERSTREAM, INC.",0,0.0,0.0,81300.0
"TOWERSTREAM, INC.",0,0.0,0.0,81500.0


In [33]:
# ACS aggregate

acs_agg = pd.read_csv("data/acs5_aggregate.csv",index_col=0,
                      parse_dates=[0]).drop(['state', 'county'], axis=1)

In [34]:
acs_agg.head(5)

Unnamed: 0,B01003_001E,B28002_004E,B28002_013E,B28003_002E,B28003_004E,B28003_005E,B28003_006E,tract
0,1825,392,149,426,392,34,149,630200
1,5908,1242,231,1411,1242,169,133,580700
2,3419,917,140,1068,917,140,104,590600
3,2835,917,138,1003,917,86,81,600700
4,1639,322,245,356,322,34,218,611900


In [35]:
# ACS profile
# dropping two variables because they were all empty

acs_pro = pd.read_csv("data/acs5_profile.csv",
                      index_col=0,
                      parse_dates=[0]).drop(['state', 'county'], axis=1)

In [36]:
acs_pro.head(5)

Unnamed: 0,DP02_0001E,DP02_0152E,DP02_0152PE,DP02_0153E,DP02_0153PE,DP03_0062E,DP03_0119PE,DP05_0001E,DP05_0071E,DP05_0071PE,DP05_0078E,DP05_0078PE,tract
0,575,426,74.1,392,68.2,37422,25.7,1825,1622,88.9,0,0.0,630200
1,1544,1411,91.4,1242,80.4,47000,17.4,5908,4742,80.3,161,2.7,580700
2,1172,1068,91.1,917,78.2,46033,7.9,3419,2119,62.0,9,0.3,590600
3,1084,1003,92.5,917,84.6,45294,17.0,2835,850,30.0,82,2.9,600700
4,574,356,62.0,322,56.1,24507,55.0,1639,438,26.7,1175,71.7,611900


###### Importing Chicago Community Area data

In [37]:
# Chi comm area

comm_area = pd.read_csv("data/chi_tracts.csv",index_col=0,parse_dates=[0])

In [38]:
# renaming columns for easier joining later

comm_area=comm_area.rename(columns={"Community Area Name": "name", 
                                    "TRACT": "tract"})

In [39]:
# selecting columns we need 

comm_area=comm_area[['name', 'tract']]
comm_area.head(5)

Unnamed: 0_level_0,name,tract
STUSAB,Unnamed: 1_level_1,Unnamed: 2_level_1
IL,Rogers Park,10100
IL,Rogers Park,10201
IL,Rogers Park,10202
IL,Rogers Park,10300
IL,Rogers Park,10400


###### Joining all datasets by tract

In [40]:
# merging all three dataframes by the column "tract"
# 372924 rows 

full_df = chi_fcc.merge(acs_agg,
                        on='tract').merge(comm_area,
                                          on='tract').merge(acs_pro,
                                                            on='tract')


In [41]:
# renaming all ACS variables and community area and tract

full_df=full_df.rename(columns={'Community Area Name': 'name',
                                'TRACT': 'tract',
                                'B01003_001E': 'b_total_pop',
                                'B28002_004E': 'internet_any_broad',
                               'B28002_013E': 'no_internet',
                               'B28003_002E': 'has_computer',
                               'B28003_004E': 'computer_broad',
                               'B28003_005E': "computer_no_internet",
                               'B28003_006E': 'no_computer',
                               'DP02_0001E': 'total_house',
                               'DP02_0152E': 'total_house_computer',
                               'DP02_0152PE': 'percent_house_computer',
                               'DP02_0153E': 'total_house_broad',
                               'DP02_0153PE': 'percent_house_broad',
                               'DP03_0062E': 'median_income',
                               'DP03_0119PE': 'percent_poverty',
                               'DP05_0001E': 'dp_total_pop',
                               'DP05_0071E': 'total_hispanic',
                               'DP05_0071PE': 'percent_hispanic',
                                'DP05_0078E': 'total_black',
                                'DP05_0078PE': 'percent_black'})

In [42]:
full_df.head(5)

Unnamed: 0,Consumer,MaxAdDown,MaxAdUp,tract,b_total_pop,internet_any_broad,no_internet,has_computer,computer_broad,computer_no_internet,...,percent_house_computer,total_house_broad,percent_house_broad,median_income,percent_poverty,dp_total_pop,total_hispanic,percent_hispanic,total_black,percent_black
0,0,0.0,0.0,20702.0,8226,1961,365,2149,1925,224,...,89.7,1961,81.8,49167,14.5,8226,2475,30.1,698,8.5
1,0,0.0,0.0,20702.0,8226,1961,365,2149,1925,224,...,89.7,1961,81.8,49167,14.5,8226,2475,30.1,698,8.5
2,0,0.0,0.0,20702.0,8226,1961,365,2149,1925,224,...,89.7,1961,81.8,49167,14.5,8226,2475,30.1,698,8.5
3,0,0.0,0.0,20702.0,8226,1961,365,2149,1925,224,...,89.7,1961,81.8,49167,14.5,8226,2475,30.1,698,8.5
4,0,0.0,0.0,20702.0,8226,1961,365,2149,1925,224,...,89.7,1961,81.8,49167,14.5,8226,2475,30.1,698,8.5


###### Internet Stats

##### Average and Median Download and Upload Speeds by Chicago Community Area

In [43]:
# selecting columns we need

download_speeds = full_df[['name', 'MaxAdDown', 'MaxAdUp']]

In [44]:
#calculating mean speeds

download_speeds.groupby(by="name").mean().sort_values(["MaxAdUp"], 
                                                      ascending=False)

Unnamed: 0_level_0,MaxAdDown,MaxAdUp
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Dunning,246.429453,115.015010
West Town,229.364653,111.663583
Avondale,228.131013,105.684701
Irving Park,230.232691,105.280061
Garfield Ridge,232.221788,102.162929
...,...,...
Gage Park,147.880358,7.766088
West Englewood,147.488897,7.525122
West Garfield Park,134.083025,7.127243
North Lawndale,132.111270,7.038640


In [45]:
# calculating median speeds

download_speeds.groupby(by="name").median()

Unnamed: 0_level_0,MaxAdDown,MaxAdUp
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Albany Park,25.0,3.0
"Archer Heights,",25.0,3.0
Armour Square,25.0,3.0
Ashburn,25.0,3.0
Auburn Gresham,25.0,3.0
...,...,...
West Lawn,25.0,3.0
West Pullman,25.0,3.0
West Ridge,25.0,3.0
West Town,25.0,3.0


This was interesting because the means and medians of average download / upload speeds were very different. The average download speeds were in the range of 114 to 293 mbps. The average upload speeds were from 7 to 115 mbps. The median upload speeds were 25 mbps and download speeds were 3 mbps for all community areas.

###### Median Household Income

In [54]:
# selecting columns we need

median = full_df[['name', 'median_income']]
median.groupby(by = "name").median().sort_values(["median_income"], 
                                                      ascending = False)

Unnamed: 0_level_0,median_income
name,Unnamed: 1_level_1
Lincoln Park,145855.0
Forest Glen,127473.0
Lake View,121500.0
North Center,116595.0
Near South Side,115484.0
...,...
North Lawndale,25701.0
West Garfield Park,24714.0
Fuller Park,23922.0
Englewood,23125.0


The median household income for the neighborhoods is in the range of $15,293 to $145,855. The lowest is in Riverdale and the highest in Lincoln Park.

###### Household Poverty Rate

In [52]:
# selecting columns we need

poverty = full_df[['name', 'percent_poverty']]
poverty.groupby(by = "name").median().sort_values(["percent_poverty"], 
                                                      ascending = False)

Unnamed: 0_level_0,percent_poverty
name,Unnamed: 1_level_1
Riverdale,53.3
West Garfield Park,37.2
Englewood,35.7
East Garfield Park,32.7
Woodlawn,31.5
...,...
Near North Side,1.9
Lincoln Park,1.8
Beverly,1.7
North Center,1.7


Riverdale has the highest percent of poverty with 53.3% and Lake View is 1.4%.