## Aggregating ACS, FCC, Chicago Community Area Data

###### Importing Libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

###### Importing Chicago FCC data

First, we use the FCC IL Dec 2020 csv file (which can be found [here](https://us-fcc.app.box.com/v/IL-Dec2020-v1)). Since it is a large file and we only want data on Chicago, we filter for Cook County data only and then export it as a new csv file which is saved in the "data" folder. **DO NOT RUN THE FOLLOWING CHUNKS OF CODE!!!**

In [None]:
# FCC IL

fcc_df = pd.read_csv("data/IL-Fixed-Dec2020-v1.csv",
                     index_col=0,parse_dates=[0])

In [None]:
#changing BlockCode column to string type

fcc_df['BlockCode']=fcc_df['BlockCode'].astype(str)

In [None]:
# Extracting state, county, tract, block numbers from BlockCode column
# IL state=17
# Cook County=031

fcc_df['state'] = fcc_df['BlockCode'].str[:2]
fcc_df['county'] = fcc_df['BlockCode'].str[2:5]
fcc_df['tract'] = fcc_df['BlockCode'].str[5:11]
fcc_df['block'] = fcc_df['BlockCode'].str[-4:]

In [None]:
# Filtering for Cook County only 
# 763788 rows

chi_fcc = fcc_df[(fcc_df.county == "031")]

In [None]:
# dropping columns we don't need to make the file smaller

chi_fcc=chi_fcc[['ProviderName', 'Consumer', 
                 'MaxAdDown','MaxAdUp','tract']]
chi_fcc['tract']=chi_fcc['tract'].astype(float)

In [None]:
# export final dataframe to csv file

chi_fcc.to_csv(r'data/chi_fcc.csv', index = False)

In [None]:
# final dataframe looks like this
chi_fcc.head(5)

Run this code to load the Chicago FCC data.  

In [73]:
# Chicago FCC data

chi_fcc = pd.read_csv("data/chi_fcc.csv",index_col=0,parse_dates=[0])
chi_fcc.head(5)

Unnamed: 0_level_0,Consumer,MaxAdDown,MaxAdUp,tract
ProviderName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"TOWERSTREAM, INC.",0,0.0,0.0,20702.0
"TOWERSTREAM, INC.",0,0.0,0.0,80300.0
"TOWERSTREAM, INC.",0,0.0,0.0,81000.0
"TOWERSTREAM, INC.",0,0.0,0.0,81300.0
"TOWERSTREAM, INC.",0,0.0,0.0,81500.0


###### Importing Chicago ACS aggregate and profile data

In [17]:
# ACS aggregate

acs_agg = pd.read_csv("data/acs5_aggregate.csv",index_col=0,
                      parse_dates=[0]).drop(['state', 'county'], axis=1)

In [18]:
# ACS profile

acs_pro = pd.read_csv("data/acs5_profile.csv",
                      index_col=0,
                      parse_dates=[0]).drop(
                        ['state', 'county','DP05_0001E'], axis=1)

In [19]:
# merging both ACS datasets
# renaming variables
# 1319 rows x 19 columns

acs_df = acs_agg.merge(acs_pro, on='tract').rename(columns={
                                'B01003_001E': 'total_pop',
                                'B28002_002E': 'has_internet',
                               'B28002_013E': 'no_internet',
                               'B28003_002E': 'has_computer',
                               'B28003_006E': 'no_computer',
                               'DP02_0001E': 'total_households',
                               'DP02_0152E': 'total_house_computer',
                               'DP02_0153E': 'has_broadband',
                               'DP03_0062E': 'median_income',
                               'DP03_0119PE': 'percent_poverty',
                               'DP05_0071E': 'total_hispanic',
                                'DP05_0078E': 'total_black'})

acs_df.head(5)

Unnamed: 0,total_pop,has_internet,no_internet,has_computer,no_computer,tract,total_households,total_house_computer,has_broadband,median_income,percent_poverty,total_hispanic,total_black
0,1825,392,149,426,149,630200,575,426,392,37422,25.7,1622,0
1,5908,1242,231,1411,133,580700,1544,1411,1242,47000,17.4,4742,161
2,3419,928,140,1068,104,590600,1172,1068,917,46033,7.9,2119,9
3,2835,917,138,1003,81,600700,1084,1003,917,45294,17.0,850,82
4,1639,322,245,356,218,611900,574,356,322,24507,55.0,438,1175


###### Importing Chicago Community Area data

In [6]:
# importing Census tracts mapped to community area number 
# renaming columns

tracts = pd.read_csv("data/tracts_comm_areas.csv",
                        index_col=0,
                        parse_dates=[0]).rename(columns={
                        "COMMAREA": "comm_num",          
                        "TRACTCE10": "tract"})

In [10]:
# importing community area numbers and names
# renaming columns

comm_area = pd.read_csv("data/comm_areas.csv",
                        index_col=0,
                        parse_dates=[0]).rename(columns={
                        "AREA_NUMBE": "comm_num",
                        "COMMUNITY": "comm_name"})

In [12]:
# merging both dataframes above to map tract wirh community area name

tract_area= comm_area.merge(tracts, on='comm_num')

In [23]:
# selecting columns we need and renaming them

tract_area=tract_area[['comm_num', 'comm_name', 'tract']].rename(columns={
                        "comm_name": "name"})

In [24]:
# 801 rows x 3 columns
# final dataframe

tract_area.head(5)

Unnamed: 0,comm_num,name,tract
0,35,DOUGLAS,842000
1,35,DOUGLAS,351500
2,35,DOUGLAS,839500
3,35,DOUGLAS,839200
4,35,DOUGLAS,839600
...,...,...,...
796,77,EDGEWATER,30800
797,77,EDGEWATER,30900
798,9,EDISON PARK,90100
799,9,EDISON PARK,90200


Joining ACS data and Community Areas

In [61]:
# after dropping tracts with no information, we get 798 rows x 20 columns

acs_comm = tract_area.merge(acs_df, on='tract')
acs_comm=acs_comm[acs_comm['total_pop']!=0]
acs_comm.head(5)

Unnamed: 0,comm_num,name,tract,total_pop,has_internet,no_internet,has_computer,no_computer,total_households,total_house_computer,has_broadband,median_income,percent_poverty,total_hispanic,total_black
0,35,DOUGLAS,842000,3164,340,16,347,15,362,347,340,99583,0.0,221,963
1,35,DOUGLAS,351500,731,281,56,314,30,344,314,281,16250,63.4,31,388
2,35,DOUGLAS,839500,1461,474,214,512,212,724,512,474,19950,0.0,53,992
3,35,DOUGLAS,839200,2405,1197,218,1246,179,1425,1246,1178,35707,11.5,98,1382
4,35,DOUGLAS,839600,1653,504,244,568,205,773,568,499,32401,6.2,122,1466


Joining FCC data and Community Areas

In [None]:
# 372924 rows × 5 columns

fcc_comm = comm_area.merge(chi_fcc, on='tract')
fcc_comm.head(5)

###### Joining all datasets by tract

In [80]:
# merging all three dataframes by the column "tract"
# 372924 rows x 23 columns

full_df = comm_area.merge(acs_df,
                        on='tract').merge(chi_fcc,
                                          on='tract')


In [None]:
full_df.head(5)

###### Computer, Internet, Broadband Access

This section will look into the community areas at a household-level to see who has and who does not have basic access to an internet and/or computer. We will also look into the households who have a broadband internet subscription. 

In [26]:
# who has a computer? who has internet access? 
# selecting columns we need

internet_df = acs_comm[['name', 
                        'total_households',
                        'no_internet', 
                        'has_internet', 
                        'has_computer', 
                        'no_computer',
                       'has_broadband']].groupby(by="name").sum()

In [27]:
# calculating who does not have a broadband internet subscription

internet_df['no_broadband']=internet_df['total_households']-internet_df['has_broadband']

In [28]:
# calculating percentages of households for each variable

# percentage of households with/out internet access

internet_df['no_internet']=internet_df['no_internet']/internet_df['total_households']*100
internet_df['has_internet']=internet_df['has_internet']/internet_df['total_households']*100

# percentage of households with/out no computer

internet_df['no_computer']=internet_df['no_computer']/internet_df['total_households']*100
internet_df['has_computer']=internet_df['has_computer']/internet_df['total_households']*100

# percentage of households with/out broadband 

internet_df['no_broadband']=internet_df['no_broadband']/internet_df['total_households']*100
internet_df['has_broadband']=internet_df['has_broadband']/internet_df['total_households']*100

In [63]:
# resulting dataframe

internet_df.sort_values(["has_broadband"],
                        ascending=True)

Unnamed: 0_level_0,total_households,no_internet,has_internet,has_computer,no_computer,has_broadband,no_broadband
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BURNSIDE,888,42.229730,53.716216,62.725225,37.274775,52.252252,47.747748
ENGLEWOOD,8983,34.598686,56.584660,74.897028,25.102972,56.306356,43.693644
WEST ENGLEWOOD,9483,38.089212,57.239270,70.336391,29.663609,57.239270,42.760730
FULLER PARK,1128,34.663121,59.042553,69.414894,30.585106,59.042553,40.957447
NORTH LAWNDALE,11075,29.986456,59.873589,76.000000,24.000000,59.620767,40.379233
...,...,...,...,...,...,...,...
LINCOLN SQUARE,18347,7.216439,89.960211,95.187224,4.812776,89.867553,10.132447
NORTH CENTER,14093,8.238132,90.562691,95.217484,4.782516,90.392393,9.607607
LINCOLN PARK,32395,6.254052,91.106652,94.786232,5.213768,91.029480,8.970520
LAKE VIEW,53480,5.731114,91.316380,96.394914,3.605086,91.071429,8.928571


Based on 2015-2019 ACS data, the percentages of households with computers, internet access, and internet broadband subscription are seen above. The neighborhoods of Burnside, Englewood, West Englewood, Fuller Park have the lowest percentages of both broadband subscription and internet access. The neighborhods of Near South Side, Lake View, Lincoln Park, and North Center have the highest. 

Broadband subscription and internet access numbers are extremely close, suggesting that the overwhelming majority of households who have access to internet do so via a broadbad subscription. 

###### Median Household Income & Poverty

In [64]:
# median household income and poverty rates by community area
# taking the median of median household incomes  

income_df = acs_comm[['name', 
                      'median_income', 
                      'percent_poverty']].groupby(by = "name").median().sort_values(["percent_poverty"], 
                                                                                  ascending = False)

income_df

Unnamed: 0_level_0,median_income,percent_poverty
name,Unnamed: 1_level_1,Unnamed: 2_level_1
RIVERDALE,15408.0,49.20
ENGLEWOOD,23125.0,36.60
WEST GARFIELD PARK,24001.5,36.40
EAST GARFIELD PARK,26500.5,33.05
GREATER GRAND CROSSING,29197.5,31.05
...,...,...
LOOP,105094.0,2.30
LINCOLN PARK,127177.5,1.85
NORTH CENTER,119904.5,1.80
BEVERLY,111667.0,1.70


Median income and poverty rates are based on household-level data. The medians of median household income and poverty rates across tracts were used to make the table above. The median household incomes in Chicago community areas range from Riverdale's \\$15,408 all the way to Lincoln Park's \\$127,177. Riverdale also has the highest percentage of their households in poverty at 49.2%. 

###### Race & Ethnicity

In [65]:
# race and ethnicity by community area

race_df = acs_comm[['name', 
                        'total_pop',
                        'total_hispanic', 
                        'total_black']].groupby(by = "name").sum().sort_values(["name"], 
                                                                      ascending = True)

In [66]:
# percentage of population hispanic

race_df['total_hispanic']=race_df['total_hispanic']/race_df['total_pop']*100

# percentage of population black non-hispanic

race_df['total_black']=race_df['total_black']/race_df['total_pop']*100

In [67]:
# final race and ethnicity dataframe

race_df.sort_values(["total_black"], ascending = False)

Unnamed: 0_level_0,total_pop,total_hispanic,total_black
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AVALON PARK,9713,0.092659,96.581901
BURNSIDE,2006,1.944167,96.261216
WASHINGTON HEIGHTS,26742,1.140528,96.096029
CHATHAM,30967,0.846062,95.666355
GREATER GRAND CROSSING,30149,1.738034,95.654914
...,...,...,...
FOREST GLEN,19384,15.131036,1.191704
JEFFERSON PARK,27503,24.746391,1.065338
ARCHER HEIGHTS,13726,77.415125,0.954393
NORWOOD PARK,43405,13.498445,0.799447


Race and ethnicity data are based on total population numbers. Hispanic/Latino ethnicity was based on all races. Black/African-American race was non-Hispanic/Latino. Chicago community areas vary vastly in their race and ethnic compositions. Gage Park, South Lawndalw, West Elsdon, and Hermosa have the highest percentages of Hispanics/Latinos of all races. Calumet Heights, Washington Heights, Avalon Park, and Oakland have the highest percentages of non-Hispanic/Latino African-American/Blacks. 

##### Internet Access & Demographics Combined

The table below shows all of the variables above.

In [68]:
# merging all tables by community area

internet_demographics = internet_df.merge(income_df,
                        on='name').merge(race_df,
                                          on='name')

In [69]:
# reordering columns and showing final dataframe
internet_demographics = internet_demographics[['total_pop',
                      'total_households',
                      'has_computer',
                      'has_internet',
                      'has_broadband',
                      'median_income',
                      'percent_poverty',
                     'total_hispanic',
                     'total_black']].sort_values(["name"], 
                                                 ascending = True)


In [71]:
internet_demographics

Unnamed: 0_level_0,total_pop,total_households,has_computer,has_internet,has_broadband,median_income,percent_poverty,total_hispanic,total_black
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ALBANY PARK,49806,16909,89.325211,79.768171,79.703117,66818.0,12.10,44.972493,4.941172
ARCHER HEIGHTS,13726,3919,81.832100,73.641235,73.641235,48629.0,10.10,77.415125,0.954393
ARMOUR SQUARE,13538,5396,75.315048,68.291327,68.087472,33333.0,25.80,4.321170,8.383809
ASHBURN,43356,13124,90.269735,79.617495,79.564157,69261.0,8.75,41.327613,45.871390
AUBURN GRESHAM,45909,17161,79.972030,60.567566,59.967368,35568.0,23.60,2.180400,95.386526
...,...,...,...,...,...,...,...,...,...
WEST LAWN,31886,9272,84.156601,76.509922,75.711821,52992.5,12.55,83.971022,2.650066
WEST PULLMAN,30020,10598,84.657483,76.438951,76.203057,43143.0,15.50,5.203198,91.868754
WEST RIDGE,78466,25714,91.448238,81.181458,80.812009,53153.0,14.30,18.906278,11.579538
WEST TOWN,83757,37819,93.310241,88.817790,88.719956,102083.0,4.50,22.167699,6.837637


In [27]:
# export to Excel file in the data folder 

internet_demographics.to_csv("data/access_comm_area.csv") 

All variables except total_pop, total_households, and median_income are in percentages. All internet variables are on the household level. All demographic variables are on the community area population level. 

Based on 2015-2019 ACS data, the percentages of households with computers, internet access, and internet broadband subscription are seen above. The neighborhoods of Burnside, Englewood, West Englewood, Fuller Park have the lowest percentages of both broadband subscription and internet access. The neighborhods of Near South Side, Lake View, Lincoln Park, and North Center have the highest. 

Broadband subscription and internet access numbers are extremely close, suggesting that the overwhelming majority of households who have access to internet do so via a broadbad subscription. 

Median income and poverty rates are based on household-level data. The medians of median household income and poverty rates across tracts were used to make the table above. The median household incomes in Chicago community areas range from Riverdale's \\$15,408 all the way to Lincoln Park's \\$127,177. Riverdale also has the highest percentage of their households in poverty at 49.2%. 

Race and ethnicity data are based on total population numbers. Hispanic/Latino ethnicity was based on all races. Black/African-American race was non-Hispanic/Latino. Chicago community areas vary vastly in their race and ethnic compositions. Gage Park, South Lawndalw, West Elsdon, and Hermosa have the highest percentages of Hispanics/Latinos of all races. Calumet Heights, Washington Heights, Avalon Park, and Oakland have the highest percentages of non-Hispanic/Latino African-American/Blacks. 