## Aggregating ACS, FCC, Chicago Community Area Data

###### Importing Libraries

In [31]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

###### Importing Chicago FCC data

First, we use the FCC IL Dec 2020 csv file (which can be found [here](https://us-fcc.app.box.com/v/IL-Dec2020-v1)). Since it is a large file and we only want data on Chicago, we filter for Cook County data only and then export it as a new csv file which is saved in the "data" folder. **DO NOT RUN THE FOLLOWING CHUNKS OF CODE!!!**

In [126]:
# FCC IL

fcc_df = pd.read_csv("data/IL-Fixed-Dec2020-v1.csv",
                     index_col=0,parse_dates=[0])

In [161]:
#changing BlockCode column to string type

fcc_df['BlockCode']=fcc_df['BlockCode'].astype(str)

In [162]:
# Extracting state, county, tract, block numbers from BlockCode column
# IL state=17
# Cook County=031

fcc_df['state'] = fcc_df['BlockCode'].str[:2]
fcc_df['county'] = fcc_df['BlockCode'].str[2:5]
fcc_df['tract'] = fcc_df['BlockCode'].str[5:11]
fcc_df['block'] = fcc_df['BlockCode'].str[-4:]

In [163]:
# Filtering for Cook County only 
# 763788 rows

chi_fcc = fcc_df[(fcc_df.county == "031")]

In [165]:
# dropping columns we don't need to make the file smaller

chi_fcc=chi_fcc[['ProviderName', 'Consumer', 
                 'MaxAdDown','MaxAdUp','tract']]
chi_fcc['tract']=chi_fcc['tract'].astype(float)

In [167]:
# export final dataframe to csv file

chi_fcc.to_csv(r'data/chi_fcc.csv', index = False)

In [168]:
# final dataframe looks like this
chi_fcc.head(5)

Unnamed: 0_level_0,ProviderName,Consumer,MaxAdDown,MaxAdUp,tract
LogRecNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2366458,"TOWERSTREAM, INC.",0,0.0,0.0,20702.0
2366459,"TOWERSTREAM, INC.",0,0.0,0.0,80300.0
2366460,"TOWERSTREAM, INC.",0,0.0,0.0,81000.0
2366461,"TOWERSTREAM, INC.",0,0.0,0.0,81300.0
2366462,"TOWERSTREAM, INC.",0,0.0,0.0,81500.0


Run this code to load the Chicago FCC data.  

In [None]:
# Chicago FCC data

chi_fcc = pd.read_csv("data/chi_fcc.csv",index_col=0,parse_dates=[0])
chi_fcc.head(5)

###### Importing Chicago ACS aggregate and profile data

In [200]:
# ACS aggregate

acs_agg = pd.read_csv("data/acs5_aggregate.csv",index_col=0,
                      parse_dates=[0]).drop(['state', 'county'], axis=1)

In [202]:
# ACS profile

acs_pro = pd.read_csv("data/acs5_profile.csv",
                      index_col=0,
                      parse_dates=[0]).drop(
                        ['state', 'county','DP05_0001E'], axis=1)

In [203]:
# merging both ACS datasets
# renaming variables
# 1319 rows x 19 columns

acs_df = acs_agg.merge(acs_pro, on='tract').rename(columns={
                                'B01003_001E': 'b_total_pop',
                                'B28002_004E': 'internet_any_broad',
                               'B28002_013E': 'no_internet',
                               'B28003_002E': 'has_computer',
                               'B28003_004E': 'computer_broad',
                               'B28003_005E': "computer_no_internet",
                               'B28003_006E': 'no_computer',
                               'DP02_0001E': 'total_house',
                               'DP02_0152E': 'total_house_computer',
                               'DP02_0152PE': 'percent_house_computer',
                               'DP02_0153E': 'total_house_broad',
                               'DP02_0153PE': 'percent_house_broad',
                               'DP03_0062E': 'median_income',
                               'DP03_0119PE': 'percent_poverty',
                               'DP05_0071E': 'total_hispanic',
                               'DP05_0071PE': 'percent_hispanic',
                                'DP05_0078E': 'total_black',
                                'DP05_0078PE': 'percent_black'})

acs_df.head(5)

Unnamed: 0,b_total_pop,internet_any_broad,no_internet,has_computer,computer_broad,computer_no_internet,no_computer,tract,total_house,total_house_computer,percent_house_computer,total_house_broad,percent_house_broad,median_income,percent_poverty,total_hispanic,percent_hispanic,total_black,percent_black
0,1825,392,149,426,392,34,149,630200,575,426,74.1,392,68.2,37422,25.7,1622,88.9,0,0.0
1,5908,1242,231,1411,1242,169,133,580700,1544,1411,91.4,1242,80.4,47000,17.4,4742,80.3,161,2.7
2,3419,917,140,1068,917,140,104,590600,1172,1068,91.1,917,78.2,46033,7.9,2119,62.0,9,0.3
3,2835,917,138,1003,917,86,81,600700,1084,1003,92.5,917,84.6,45294,17.0,850,30.0,82,2.9
4,1639,322,245,356,322,34,218,611900,574,356,62.0,322,56.1,24507,55.0,438,26.7,1175,71.7


###### Importing Chicago Community Area data

In [204]:
# Chi comm area
#rename columns

comm_area = pd.read_csv("data/chi_tracts.csv",
                        index_col=0,
                        parse_dates=[0]).rename(columns={
                        "Community Area Name": "name",          
                        "TRACT": "tract"})

In [205]:
# selecting columns we need 

comm_area=comm_area[['name', 'tract']]
comm_area.head(5)

Unnamed: 0_level_0,name,tract
STUSAB,Unnamed: 1_level_1,Unnamed: 2_level_1
IL,Rogers Park,10100
IL,Rogers Park,10201
IL,Rogers Park,10202
IL,Rogers Park,10300
IL,Rogers Park,10400


Joining ACS data and Community Areas

In [208]:
# 803 rows x 20 columns

acs_comm = comm_area.merge(acs_df, on='tract')
acs_comm.head(5)

Unnamed: 0,name,tract,b_total_pop,internet_any_broad,no_internet,has_computer,computer_broad,computer_no_internet,no_computer,total_house,total_house_computer,percent_house_computer,total_house_broad,percent_house_broad,median_income,percent_poverty,total_hispanic,percent_hispanic,total_black,percent_black
0,Rogers Park,10100,4599,1789,460,2044,1762,270,319,2363,2044,86.5,1789,75.7,32474,26.5,523,11.4,2045,44.5
1,Rogers Park,10201,7455,2091,434,2469,2091,378,278,2747,2469,89.9,2091,76.1,45639,28.6,1671,22.4,2481,33.3
2,Rogers Park,10202,2896,919,176,1022,912,110,114,1136,1022,90.0,919,80.9,41486,16.4,753,26.0,974,33.6
3,Rogers Park,10300,6485,2620,418,2883,2604,279,208,3091,2883,93.3,2620,84.8,41250,6.6,1099,16.9,1995,30.8
4,Rogers Park,10400,5213,1611,333,1774,1611,163,214,1988,1774,89.2,1611,81.0,39700,13.9,392,7.5,1104,21.2


Joining FCC data and Community Areas

In [209]:
# 372924 rows × 5 columns

fcc_comm = comm_area.merge(chi_fcc, on='tract')
fcc_comm.head(5)

Unnamed: 0,name,tract,Consumer,MaxAdDown,MaxAdUp
0,Rogers Park,10100,0,0.0,0.0
1,Rogers Park,10100,0,0.0,0.0
2,Rogers Park,10100,0,0.0,0.0
3,Rogers Park,10100,0,0.0,0.0
4,Rogers Park,10100,0,0.0,0.0


###### Joining all datasets by tract

In [219]:
# merging all three dataframes by the column "tract"
# 372924 rows x 23 columns

full_df = comm_area.merge(acs_df,
                        on='tract').merge(chi_fcc,
                                          on='tract')


In [220]:
full_df.head(5)

Unnamed: 0,name,tract,b_total_pop,internet_any_broad,no_internet,has_computer,computer_broad,computer_no_internet,no_computer,total_house,...,percent_house_broad,median_income,percent_poverty,total_hispanic,percent_hispanic,total_black,percent_black,Consumer,MaxAdDown,MaxAdUp
0,Rogers Park,10100,4599,1789,460,2044,1762,270,319,2363,...,75.7,32474,26.5,523,11.4,2045,44.5,0,0.0,0.0
1,Rogers Park,10100,4599,1789,460,2044,1762,270,319,2363,...,75.7,32474,26.5,523,11.4,2045,44.5,0,0.0,0.0
2,Rogers Park,10100,4599,1789,460,2044,1762,270,319,2363,...,75.7,32474,26.5,523,11.4,2045,44.5,0,0.0,0.0
3,Rogers Park,10100,4599,1789,460,2044,1762,270,319,2363,...,75.7,32474,26.5,523,11.4,2045,44.5,0,0.0,0.0
4,Rogers Park,10100,4599,1789,460,2044,1762,270,319,2363,...,75.7,32474,26.5,523,11.4,2045,44.5,0,0.0,0.0


In [222]:
full_df[['name', 'tract',
        'b_total_pop']].groupby(['tract', 'name']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,b_total_pop
tract,name,Unnamed: 2_level_1
10100,Rogers Park,4599.0
10201,Rogers Park,7455.0
10202,Rogers Park,2896.0
10300,Rogers Park,6485.0
10400,Rogers Park,5213.0
...,...,...
843700,North Center,2527.0
843800,New City,1520.0
843900,Woodlawn,3430.0
980000,Ohare,0.0


###### Internet and Computer Access

This section will look into the community areas at a household-level to see who has and who does not have basic access to an internet and/or computer. 

In [240]:
# who has a computer? who has internet access? 
# selecting columns we need

internet_stats = acs_comm[['name', 
                        'b_total_pop',
                        'total_house',
                        'no_internet', 
                        'no_computer', 
                        'has_computer', 
                        'internet_any_broad']].groupby(by="name").sum()

In [241]:
internet_stats

Unnamed: 0_level_0,b_total_pop,total_house,no_internet,no_computer,has_computer,internet_any_broad
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Albany Park,49806,16909,2674,1805,15104,13477
"Archer Heights,",13726,3919,772,712,3207,2886
Armour Square,13538,5396,1488,1332,4064,3674
Ashburn,43356,13124,1840,1277,11847,10442
Auburn Gresham,45909,17161,5282,3437,13724,10291
...,...,...,...,...,...,...
West Lawn,31886,9272,1752,1469,7803,7020
West Pullman,27028,9129,1880,1378,7751,6903
West Ridge,78466,25714,3676,2199,23515,20780
West Town,83757,37819,3187,2530,35289,33553


In [232]:
# calculating percentages per neighborhood for those 
# who have and don't have a computer
# who have and don't have any internet access




The neighborhoods with the lowest percentage without internet and computer are the Loop, Lake View, Near South/North Sides. The neighborhoods with the highest percentages were Burnside, Fuller Park, Englewood. 

The percentages do not add up for internet/computer access. So people are underreporting 

###### Broadband Access

This section will look into each community area to look at broadband access. 

##### Average and Median Download and Upload Speeds by Chicago Community Area

In [90]:
# selecting columns we need

download_speeds = full_df[['name', 'MaxAdDown', 'MaxAdUp']]

In [44]:
#calculating mean speeds

download_speeds.groupby(by="name").mean().sort_values(["MaxAdUp"], 
                                                      ascending=False)

Unnamed: 0_level_0,MaxAdDown,MaxAdUp
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Dunning,246.429453,115.015010
West Town,229.364653,111.663583
Avondale,228.131013,105.684701
Irving Park,230.232691,105.280061
Garfield Ridge,232.221788,102.162929
...,...,...
Gage Park,147.880358,7.766088
West Englewood,147.488897,7.525122
West Garfield Park,134.083025,7.127243
North Lawndale,132.111270,7.038640


In [45]:
# calculating median speeds

download_speeds.groupby(by="name").median()

Unnamed: 0_level_0,MaxAdDown,MaxAdUp
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Albany Park,25.0,3.0
"Archer Heights,",25.0,3.0
Armour Square,25.0,3.0
Ashburn,25.0,3.0
Auburn Gresham,25.0,3.0
...,...,...
West Lawn,25.0,3.0
West Pullman,25.0,3.0
West Ridge,25.0,3.0
West Town,25.0,3.0


This was interesting because the means and medians of average download / upload speeds were very different. The average download speeds were in the range of 114 to 293 mbps. The average upload speeds were from 7 to 115 mbps. The median upload speeds were 25 mbps and download speeds were 3 mbps for all community areas.

###### Median Household Income

In [54]:
# selecting columns we need

median = full_df[['name', 'median_income']]
median.groupby(by = "name").median().sort_values(["median_income"], 
                                                      ascending = False)

Unnamed: 0_level_0,median_income
name,Unnamed: 1_level_1
Lincoln Park,145855.0
Forest Glen,127473.0
Lake View,121500.0
North Center,116595.0
Near South Side,115484.0
...,...
North Lawndale,25701.0
West Garfield Park,24714.0
Fuller Park,23922.0
Englewood,23125.0


The median household income for the neighborhoods is in the range of $15,293 to $145,855. The lowest is in Riverdale and the highest in Lincoln Park.

###### Household Poverty Rate

In [52]:
# selecting columns we need

poverty = full_df[['name', 'percent_poverty']]
poverty.groupby(by = "name").median().sort_values(["percent_poverty"], 
                                                      ascending = False)

Unnamed: 0_level_0,percent_poverty
name,Unnamed: 1_level_1
Riverdale,53.3
West Garfield Park,37.2
Englewood,35.7
East Garfield Park,32.7
Woodlawn,31.5
...,...
Near North Side,1.9
Lincoln Park,1.8
Beverly,1.7
North Center,1.7


Riverdale has the highest percent of poverty with 53.3% and Lake View is 1.4%.