# Computing and Visualizing Access and Equity

In this workbook we will combine the travel times we calculated in the previous notebook with the demographics visualized in the first to get an understanding of how Calgary's transit access to two opportunities (hospitals and childcare) are distributed.

Our basic workflow will be:
1. Using the travel time matrices and destination data, calculate an access metric for each DA zone
2. Compute a population weighted sum for different population groups to see how these benefits are distributed on average.

## What is Access to Opportunities?

Access (or accessibility) is a specific metric for quatitatively assessing a transportation system that captures both the quality of the transit supply and the land use patterns in an area. Access focuses on the ability of a transportation system to connect people (at origins) to where they want to go (at destinations).

There are two broad ways of measuring access. One focuses on the total amount of possibility or opportunity that is reachable. We call this a **cumulative** measure. The other focuses on the time taken to reach the nearest (or nth) nearest opportunity. We call this a **travel time** measure. If you're a fan of mathematical nomenclature, we can think of these as *primal* and *dual* measures of access, respectively.

Schematically, here's how you can think of these two measures:

![title](img/access_schematic.png)

Opportunities are shown with blue squares and green diamonds. A cumulative measure asks: "From an origin, how many blue squares can we reach within a given amount of difficulty or effort (usually travel time)?", while a travel time measure asks "What is the closest (or 3rd closest) green diamond to my origin?".

Importantly:
- With cumulative (primal) measures, **higher is better**.
- With travel time (dual) measures, **lower is better**

### A Little Mathematics

Since we are computing these measures, it's worth having a mathematical definition to refer to if we'd like. We can generalize the computation of the two measures with:

![title](img/primal_dual.png)

Below are some more specific mathematical definitions.

### Cumulative Measures
The more common of the two measures above are cumulative measures. For a set of origins $i \in I$ and a set of destinations $j \in J$, we can calculate the access to opportunities with:

\begin{equation}
    A_i = \sum_{j\in J} O_j \  f(\cdot)
\end{equation}

Where $f(\cdot)$ is a function of the cost (usually travel time) to get from $i$ to $j$, and $O_j$ is the count or value of opportunities at the destination $j$. The properties and shape of $f(\cdot)$ depend on the model of access used. Here are some examples of common shapes of admittance (often called impedance or decay) functions.

![title](img/admittances.png)

In this workbook we are going to use a **rectangular** or threshold-based cutoff. These are the most intuitively explanable, as they ask "how many opportunities are reachable in X minutes?". Every opportunity within that X-min isochrone is counted fully, every opportunity outside is ignored:

\begin{equation}
       f(t_{ij}; \tau) =
    \begin{cases} 
      1 & t_{ij}\leq \tau\\
      0 & \mbox{otherwise}
   \end{cases}
\end{equation}

With $t_{ij}$ being the travel time. Note that this has the drawback of counting a destination very close to $\tau$ distance away the same as one with a travel time of zero, and a destination just over $\tau$ the same as one just under $\tau$.

### Travel Time Measures
Minimal cost/travel time measures find the $n$th closes (or least costly) destination and measure the travel time to that destination.

\begin{equation}
        A'_i = \max\{\min_n\{f(\cdot)X_j \ \forall j \in J\}\}
\end{equation}

where

\begin{equation}
       X_j =
    \begin{cases} 
      1 & \mbox{opportunity located at } j\\
      \infty & \mbox{otherwise}
   \end{cases}
\end{equation}

Travel time measures almost always a measure of the minimum travel time to the closest or $n$th closest, opportunity, meaning:

\begin{equation}
f(t_{ij}) = t_{ij}
\end{equation}

However it is certainly possible to adjust $f(\cdot)$ to incorporate a generalized cost, or even an admittance (or impedance!) function.

With that definition out of the way, let's get into calculating some of these measures.


## Import and Read Demographics
We have already visualized these demographics in `1 - Data Exploration.ipynb`, so let's load them in. We'll also do some helpful stuff up front like defining what columns we want to use for our demographic analysis and making a dictionary that lets us map our machine-friendly names to human-friendly names.

In [None]:
import pandas as pd
import geopandas as gpd
import altair as alt
demographic_columns = ["pop_total", "vismin_vismin", "lico_lico", "fam_onemother"]
demographic_names = {
    "pop_total": "Everyone",
    "vismin_vismin": "Visible Minority",
    "lico_lico": "Low Income",
    "fam_onemother": "Single Mother Households"
}
demographics = pd.read_csv("data/demographics.csv", dtype={"dauid":str})
demographics.head()

## Access to the Nearest Hospital
Next, lets load in our travel time matrix for the AM peak with access to hospitals. To perform a weighted summary over different demographic groups we need to do the following:

1. Decide what exactly we should do with the `NaN` values. Let's for now fill them in with a value that's 30-min abvoe the maximum possible travel time.
2. Since we are looking for *minimum travel time*, we simply group by each origin and take the minimum value.
3. Join in the demographics data so that we have everything nicely together.
4. Calculate the weighted average travel time to hopsitals for different demographic groups

In [None]:
hosp_am = pd.read_csv("data/mx_hospitals_am.csv", dtype={"from_id":str})
# Step 1
hosp_am["travel_time"] = hosp_am["travel_time"].fillna(180)
# Step 2
hosp_am = hosp_am[["from_id", "travel_time"]].groupby("from_id", as_index=False).min()
# Step 3
hosp_am = pd.merge(hosp_am, demographics, left_on="from_id", right_on="dauid")
# Step 4
# Let's keep only the totals columns and the travel time that we need
hosp_am_avg = hosp_am[["travel_time", "pop_total", "vismin_vismin", "lico_lico", "fam_onemother"]].copy()
# Now we normalize the demographic columns so we can do our weighting properly
for c in demographic_columns:
    hosp_am_avg[c] = hosp_am_avg[c]/hosp_am_avg[c].sum()
# Finally we multiply our travel time by these fractional amounts and sum to get a weighted average
hosp_am_avg = hosp_am_avg[demographic_columns].multiply(hosp_am_avg["travel_time"], axis="index").sum().to_frame().reset_index()
# Rename our columns to be something prettier
hosp_am_avg.columns = ["demographic", "avg_travel_time"]
# Finally we do some pretty names for our plots
hosp_am_avg["demo_name"] = hosp_am_avg["demographic"].map(demographic_names)
hosp_am_avg

Now we have our weighted sums, lets make a plot to show them. A bar chart covers most of what we want here, so lets use a slightly fancier version: A Lollipop chart.

In [None]:
sticks = alt.Chart(hosp_am_avg).mark_bar(color="lightgrey", height=4).encode(
    alt.X("avg_travel_time:Q", title="Average Travel Time (min)"),
    alt.Y("demo_name:N", title="", sort=["Everyone"])
)

lollipop = alt.Chart(hosp_am_avg).mark_circle(color="#823BA0", size=250, opacity=1).encode(
    alt.X("avg_travel_time:Q", title="Average Travel Time (min)"),
    alt.Y("demo_name:N", title="", sort=["Everyone"])
)

(sticks+lollipop).properties(
    title="Average Travel Time to Hospitals (Mornings)",
    width=400,
    height=100
).configure(
    font="Arial"
).configure_view(
    strokeWidth=0
).configure_axis(
    grid=False
).configure_axisY(
    labelFontWeight="bold"
)

## Comparing Two Travel Times
We can also compare two travel time situations. For example, how big is the disparity between AM peak and evening service, and who is affected by this disparity the most?

Let's build another set of access measures for the evening and compare

In [None]:
hosp_pm = pd.read_csv("data/mx_hospitals_pm.csv", dtype={"from_id":str})
hosp_pm["travel_time"] = hosp_pm["travel_time"].fillna(180)
hosp_pm = hosp_pm[["from_id", "travel_time"]].groupby("from_id", as_index=False).min()
hosp_pm = pd.merge(hosp_pm, demographics, left_on="from_id", right_on="dauid")
# Let's keep only the totals columns and the travel time that we need
hosp_pm_avg = hosp_pm[["travel_time", "pop_total", "vismin_vismin", "lico_lico", "fam_onemother"]].copy()
# Now we normalize the demographic columns so we can do our weighting properly
for c in demographic_columns:
    hosp_pm_avg[c] = hosp_pm_avg[c]/hosp_pm_avg[c].sum()
# Finally we multiply our travel time by these fractional amounts and sum to get a weighted average
hosp_pm_avg = hosp_pm_avg[demographic_columns].multiply(hosp_pm_avg["travel_time"], axis="index").sum().to_frame().reset_index()
# Rename our columns to be something prettier
hosp_pm_avg.columns = ["demographic", "avg_travel_time"]
# Finally we do some pretty names for our plots
hosp_pm_avg["demo_name"] = hosp_pm_avg["demographic"].map(demographic_names)
hosp_pm_avg

In [None]:
sticks = alt.Chart(hosp_pm_avg).mark_bar(color="lightgrey", height=4).encode(
    alt.X("avg_travel_time:Q", title="Average Travel Time (min)"),
    alt.Y("demo_name:N", title="", sort=["Everyone"])
)

lollipop = alt.Chart(hosp_pm_avg).mark_circle(color="#823BA0", size=250, opacity=1).encode(
    alt.X("avg_travel_time:Q", title="Average Travel Time (min)"),
    alt.Y("demo_name:N", title="", sort=["Everyone"])
)

(sticks+lollipop).properties(
    title="Average Travel Time to Hospitals (Evenings)",
    width=400,
    height=100
).configure(
    font="Lato"
).configure_view(
    strokeWidth=0
).configure_axis(
    grid=False
).configure_axisY(
    labelFontWeight="bold"
)

Now we can take a difference between the two

In [None]:
hosp_am_pm = pd.merge(
    hosp_am_avg[["demographic", "avg_travel_time"]], 
    hosp_pm_avg[["demographic", "avg_travel_time"]], 
    on="demographic", 
    suffixes=["_am", "_pm"]
)
hosp_am_pm["delta"] = hosp_am_pm["avg_travel_time_pm"] - hosp_am_pm["avg_travel_time_am"]
hosp_am_pm["demo_name"] = hosp_am_pm["demographic"].map(demographic_names)
hosp_am_pm

And then we can plot this difference much as we did above

In [None]:
sticks = alt.Chart(hosp_am_pm).mark_bar(color="lightgrey", height=4).encode(
    alt.X("delta:Q", title="Travel Time Increase (min)"),
    alt.Y("demo_name:N", title="", sort=["Everyone"])
)

lollipop = alt.Chart(hosp_am_pm).mark_circle(color="#559613", size=250, opacity=1).encode(
    alt.X("delta:Q", title="Travel Time Increase (min)"),
    alt.Y("demo_name:N", title="", sort=["Everyone"])
)

(sticks+lollipop).properties(
    title="Additional Evening Travel Time to Hospitals",
    width=400,
    height=100
).configure(
    font="Lato",
).configure_view(
    strokeWidth=0
).configure_axis(
    grid=False,
    labelFontSize=12,
    titleFontSize=14
).configure_axisY(
    labelFontWeight="bold"
).configure_title(
    fontSize=16,
    anchor="start"
)