# Calculating Equity of Station Access
In this notebook, we will use information about the areas surrounding stations and their demographics to analyse equity of station access. For demonstration purposes, we will perform calculations for the following:
* Visible Minorities
* Low Income Measure (LIM) Individuals

However the process outlined in this notebook can be adapted to use any census information of your choosing.

Let's start by importing the packages we'll be using to calculate and visualize the information, and setting a few parameters that are useful for data access and visualisation later on. For this notebook to run, your data folders should match those provided in the repository and described in the workbook.

In [28]:
import os
import pandas as pd
import altair as alt

# Replace with your data folder paths
counts_folder = r"data/counts"
# This is useful for our plots
station_order = ['Matheson', 'Renforth', 'Eglinton/Kipling', 'Scarlett', 'Mount Dennis GO', 'St. Clair GO', 'Bloor GO', 'Lansdowne GO', 'Liberty Village GO', 'Spadina GO', 'Union Station', 'East Harbour GO', 'Queen Street GO', 'Gerrard GO', 'Danforth GO', 'Scarborough GO', 'Kennedy GO', 'Lawrence East GO', 'Ellesmere GO', 'Agincourt GO', 'Finch GO', 'Milliken GO', '14th Avenue GO', 'Unionville GO']

## Data Loading and Joining
We start by reading both the demographic data at the dissemination area level and the station area-DA link file from the areal apportionment. Since we are interested in the low-income and visible minoritiy categories, we keep only those columns plus the demographic area station and fractional area values.

In [16]:
da_demo = pd.read_csv(os.path.join(counts_folder, "input", "da_census_profile.csv"))
# Let's fill in blanks with zeros
da_demo = da_demo.fillna(0)
sa_link = pd.read_csv(os.path.join(counts_folder, "interim", "station_area_da_link.csv"))
da_sa = pd.merge(sa_link, da_demo, on='DAUID')[['DAUID', 'station', 'frac_area', 'income_lico', 'income_total', 'vm_minority', 'vm_total']]
da_sa

Unnamed: 0,DAUID,station,frac_area,income_lico,income_total,vm_minority,vm_total
0,35201804,Eglinton/Kipling,0.031045,15.0,450.0,50.0,485.0
1,35201197,St. Clair GO,0.144030,25.0,315.0,80.0,325.0
2,35201198,St. Clair GO,0.762387,85.0,975.0,450.0,945.0
3,35201321,Lansdowne GO,0.031652,40.0,535.0,45.0,525.0
4,35201323,Lansdowne GO,0.651031,40.0,445.0,60.0,425.0
...,...,...,...,...,...,...,...
386,35204217,Milliken GO,0.034885,85.0,300.0,300.0,310.0
387,35204736,St. Clair GO,0.183712,120.0,1000.0,665.0,945.0
388,35211696,Renforth,0.000484,100.0,385.0,255.0,340.0
389,35212021,Renforth,0.043124,0.0,0.0,0.0,0.0


## Proportional Population Calculation
We want to calculate the total population of our groups of interest who live near a station area, and divide by the total population of that census question or category living near the area. To do this, we use the following steps:
1. Multiply each zone piece's population (e.g. Visible Minority) by the fractional area of the zone that is inside the buffer (e.g `frac_area`)
2. Multiply the toal subcategory population (e.g. Visible Minority - Total) with the fractional area
3. Sum both of these resulting values together by station to get total amounts of each group
4. Divide one by the other (and multiply by 100) to get a fraction (percentage).

First we find the fractional populations for our two groups and their totals category

In [17]:
da_sa['income_lico_frac'] = da_sa['income_lico'] * da_sa['frac_area']
da_sa['income_total_frac'] = da_sa['income_total'] * da_sa['frac_area']
da_sa['vm_minority_frac'] = da_sa['vm_minority'] * da_sa['frac_area']
da_sa['vm_total_frac'] = da_sa['vm_total'] * da_sa['frac_area']
da_sa

Unnamed: 0,DAUID,station,frac_area,income_lico,income_total,vm_minority,vm_total,income_lico_frac,income_total_frac,vm_minority_frac,vm_total_frac
0,35201804,Eglinton/Kipling,0.031045,15.0,450.0,50.0,485.0,0.465676,13.970289,1.552254,15.056867
1,35201197,St. Clair GO,0.144030,25.0,315.0,80.0,325.0,3.600742,45.369351,11.522375,46.809648
2,35201198,St. Clair GO,0.762387,85.0,975.0,450.0,945.0,64.802898,743.327365,343.074168,720.455754
3,35201321,Lansdowne GO,0.031652,40.0,535.0,45.0,525.0,1.266081,16.933833,1.424341,16.617313
4,35201323,Lansdowne GO,0.651031,40.0,445.0,60.0,425.0,26.041247,289.708874,39.061871,276.688250
...,...,...,...,...,...,...,...,...,...,...,...
386,35204217,Milliken GO,0.034885,85.0,300.0,300.0,310.0,2.965209,10.465445,10.465445,10.814293
387,35204736,St. Clair GO,0.183712,120.0,1000.0,665.0,945.0,22.045480,183.712334,122.168702,173.608156
388,35211696,Renforth,0.000484,100.0,385.0,255.0,340.0,0.048420,0.186419,0.123472,0.164630
389,35212021,Renforth,0.043124,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.000000


Then we group all of the rows together for each station and sum to get the categorical and total populations around teh station area. Finally, we divide the category by its total to calculate a percentage.

In [38]:
by_station = da_sa[['station', 'income_lico_frac', 'income_total_frac', 'vm_minority_frac', 'vm_total_frac']].groupby('station', as_index=False).sum()
by_station['income_lico_pct'] = 100 * by_station['income_lico_frac'] / by_station['income_total_frac']
by_station['income_vm_pct'] = 100 * by_station['vm_minority_frac'] / by_station['vm_total_frac']
by_station

Unnamed: 0,station,income_lico_frac,income_total_frac,vm_minority_frac,vm_total_frac,income_lico_pct,income_vm_pct
0,14th Avenue GO,95.240573,392.910033,337.609055,398.371858,24.239791,84.747215
1,Agincourt GO,1388.694735,6054.675988,5121.61334,6144.03583,22.935905,83.359106
2,Bloor GO,2635.65486,13222.778214,4933.694279,13240.625621,19.932686,37.261791
3,Danforth GO,1682.383369,8949.027672,3462.55832,8803.786474,18.799622,39.330331
4,East Harbour GO,508.284321,2996.803754,1000.858405,3065.537452,16.960881,32.648709
5,Eglinton/Kipling,627.262701,6042.913014,1486.090955,5999.262214,10.380138,24.771229
6,Ellesmere GO,163.23607,1128.036745,901.598834,1141.4076,14.470811,78.990085
7,Finch GO,393.992372,1800.286517,1581.200104,1788.061301,21.884982,88.430978
8,Gerrard GO,2159.864735,11138.903476,4765.685369,11075.083126,19.390281,43.030696
9,Kennedy GO,974.773647,4704.382637,3439.511215,4658.242624,20.720543,73.837099


## Visualizing Results
There are many ways to report and visualize results - in this case what is of interest is typically differences between stations, and the possibility to identify trends or sections of a line that are nearby higher concentrations of groups.

Here we will construct a "lollipop" chart, which acts as a bar chart with a little bit less chart junk. In order to capture some of the intersection or correlation of the two groups we are calculating, we will color the bars by the percentage of visible minority, and place the exact values for these users into the circles of the lollipop We can place the exact values into the circles of the lollipop, and color the bars/circles by a third variable if needed.

In [37]:
# Now let's plot
# Plot the circles
point = alt.Chart(by_station).mark_point(size=300, opacity=1, strokeWidth=2, fill='white').encode(
    alt.Y('income_lico_pct:Q', title='% Low Income (LICO)', scale=alt.Scale(domain=[0, 30])),
    alt.X('station:N',  title="", sort=station_order),
    alt.Color('income_vm_pct:Q', title='% Visible Minority', scale=alt.Scale(scheme='oranges', domain=[0, 100]))
)

# Plot the bar lines
line = alt.Chart(by_station).mark_bar(width=3, dy=-12, opacity=0.7).encode(
    alt.Y('income_lico_pct:Q'),
    alt.X('station:N', sort=station_order),
    alt.Color('income_vm_pct:Q', scale=alt.Scale(scheme='oranges'))
)

# Place the text
text = alt.Chart(by_station).mark_text(size=8, baseline='middle').encode(
    alt.Y('income_lico_pct:Q'),
    alt.X('station:N', sort=station_order),
    alt.Text('income_vm_pct:N', format=',.0f')
)

# Put them together and clean up the chart
(line+point+text).properties(
    title="Low Income and Visible Minorities near SmartTrack Station Areas",
    width=600,
    height=300
).configure(font='Roboto').configure_axis(grid=False).configure_axisX(labelAngle=-30).configure_view(strokeWidth=0)

## Write Results
We can write our results out to a csv for other later analysis as needed.

In [30]:
by_station.to_csv(os.path.join(counts_folder, "output", 'by_station_percentages.csv'), index=False)