# Access to Childcare Seats in Calgary
This workbook follows the steps of the preivous three workbooks but focuses on a different measure: How many childcare seats are reachable in a reasonable transit time, and how is that access distributed among populations?

This notebook walks briefly through data exploration, access analysis, and equity analysis using travel time matrices generated in the notebook `2 - Travel Times.ipynb`

## Data Exploration
Childcare locations are more abundant, and they also have a value attached to them in terms of the number of seats. To get a good sense of the distribution of seats, let's size the dots on the map based on the number of seats so we can see if there are clusters both in terms of number of options but also in terms of scale.

In [None]:
import altair as alt
import geopandas as gpd
import pandas as pd
PROJECTION = "mercator"

da = gpd.read_file("data/da_with_locations.geojson")
da.head()

In [None]:
daycares = gpd.read_file("data/daycare_locations.geojson")

basemap = alt.Chart(da).mark_geoshape(fill="lightgrey", stroke="lightgrey")

dayc = alt.Chart(daycares).mark_circle().encode(
    latitude='latitude:Q',
    longitude='longitude:Q',
    size=alt.Size("capacity:Q", title="Capacity"),
    tooltip='name:N'
).project(
    "mercator"
).properties(
    width=500,
    height=400
)

basemap + dayc

## Import and Read Demographics
We have already visualized these demographics in `1 - Data Exploration.ipynb`, so let's load them in. We'll also do some helpful stuff up front like defining what columns we want to use for our demographic analysis and making a dictionary that lets us map our machine-friendly names to human-friendly names.

In [None]:
demographic_columns = ["pop_total", "vismin_vismin", "lico_lico", "fam_onemother"]
demographic_names = {
    "pop_total": "Everyone",
    "vismin_vismin": "Visible Minority",
    "lico_lico": "Low Income",
    "fam_onemother": "Single Mother Households"
}
demographics = pd.read_csv("data/demographics.csv", dtype={"dauid":str})
demographics.head()

## Calculate the Cumulative Measure
Here's the recipe for computing our metirc:
1. Load in our matrix and filter out all travel times more than our specified cutoff.
2. Load in our daycare capacity data (using the data encoded in `daycare_locations.geojson`) and merge it on the *destination* column
3. Group by our origin zone and sum all of the daycare spots reachable from that zone in the cutoff.

In [None]:
cutoff = 30

# Step 1
daycare_am = pd.read_csv("data/mx_daycares_am.csv", dtype={"from_id":str})
daycare_am = daycare_am[daycare_am["travel_time"] < cutoff]

# Step 2
# We only need the data, not the geospatial stuff so we can drop that as we read it
daycare_seats = pd.DataFrame(gpd.read_file("data/daycare_locations.geojson").drop(columns="geometry"))
daycare_am_sum = pd.merge(daycare_am[["from_id", "to_id"]], daycare_seats[["id", "capacity"]], left_on="to_id", right_on="id")

# Step 3
daycare_am_sum = daycare_am_sum[["from_id", "capacity"]].groupby("from_id", as_index=False).sum()

daycare_am_sum.head()

before we do our weighted summary, let's put these values on a map to see what it looks like:

In [None]:
import altair as alt

da_areas = gpd.read_file("data/da_with_locations.geojson")
da_areas = pd.merge(da_areas, daycare_am_sum, left_on="dauid", right_on="from_id")
da_areas
seats = alt.Chart(da_areas).mark_geoshape().encode(
    color=alt.Color("capacity:Q", title=f"Seats Reachable")
).project(PROJECTION)

seats.properties(
    title={
        "text": f"Daycare Seats Reachable in {cutoff} minutes by Transit",
        "subtitle": "Estimates for 7-9am on Wednesday, March 15"
    },
    width=700,
    height=900
)

and now let's do a weighted summary of these values:

In [None]:
daycare_am_demo = pd.merge(daycare_am_sum, demographics, left_on="from_id", right_on="dauid", how="right")
daycare_am_demo["capacity"] = daycare_am_demo["capacity"].fillna(0)
# We also want to fill our population for our normalization
daycare_am_demo[demographic_columns] = daycare_am_demo[demographic_columns].fillna(0)
for c in demographic_columns:
    daycare_am_demo[c] = daycare_am_demo[c]/daycare_am_demo[c].sum()  
    
# Finally we multiply our travel time by these fractional amounts and sum to get a weighted average
daycare_am_demo = daycare_am_demo[demographic_columns].multiply(daycare_am_demo["capacity"], axis="index").sum().to_frame().reset_index()
# Rename our columns to be something prettier
daycare_am_demo.columns = ["demographic", "average_seats"]
# Finally we do some pretty names for our plots
daycare_am_demo["demo_name"] = daycare_am_demo["demographic"].map(demographic_names)
daycare_am_demo

Finally, we can plot the distribution for ourselves

In [None]:
sticks = alt.Chart(daycare_am_demo).mark_bar(color="lightgrey", height=4).encode(
    alt.X("average_seats:Q", title="Average Number of Seats"),
    alt.Y("demo_name:N", title="", sort=["Everyone"])
)

lollipop = alt.Chart(daycare_am_demo).mark_circle(color="#559613", size=250, opacity=1).encode(
    alt.X("average_seats:Q", title="Average Number of Seats"),
    alt.Y("demo_name:N", title="", sort=["Everyone"])
)

(sticks+lollipop).properties(
    title=f"Daycare Seats Reachable in {cutoff} Minutes",
    width=400,
    height=100
).configure(
    font="Lato",
).configure_view(
    strokeWidth=0
).configure_axis(
    grid=False,
    labelFontSize=12,
    titleFontSize=14
).configure_axisY(
    labelFontWeight="bold"
).configure_title(
    fontSize=16,
    anchor="start"
)