# Explore: Amtrak State Supported Michigan Service

## Intercity Passenger Rail Service Station Performance Metrics

The Amtrak [network](https://www.amtrak.com/content/dam/projects/dotcom/english/public/documents/Maps/Amtrak-System-Map-020923.pdf)
is a passenger rail service that provides intercity rail service in the
continental United States and to select Canadian cities. The network is operated by the
[National Railroad Passenger Corporation](https://railroads.dot.gov/passenger-rail/amtrak/amtrak),
a federally chartered for-profit corporation that receives some state funding and covers its
operating costs by selling tickets and providing other services.

This notebook commences exploration of the augmented quarterly
[Amtrak](https://www.amtrak.com/home.html) station performance metrics for trains supported by
the State of Michigan. The goal is to better understand individual Amtrak Michigan Service
performance and identify potential areas for further analysis.

### Variable names

A number of variable names in this project leverage the following abbreviations. The naming
strategy is to strike a balance between brevity and readability:

* `amtk`: Amtrak (reporting mark)
* `chrt`: chart
* `cols`: columns
* `const`: constant
* `cwd`: current working directory
* `eb`: eastbound direction of travel
* `lm`: linear model
* `mi`: miles
* `mm`: minutes (ISO 8601)
* `nb`: northbound direction of travel
* `psgr`: passenger
* `qtr`: quarter
* `rte`: route
* `sb`: southbound direction of travel
* `stats`: summary statistics
* `stn`: station
* `stns`: stations
* `svc`: service
* `trn`: train
* `wb`: westbound direction of travel

In [None]:
import json
import numpy as np
import pandas as pd
import pathlib as pl
import tomllib as tl

import fra_amtrak.amtk_detrain as detrn
import fra_amtrak.amtk_frame as frm
import fra_amtrak.amtk_network as ntwk
import fra_amtrak.chart_box_preagg as boxp
import fra_amtrak.chart_hist as hst
import fra_amtrak.chart_hist_layer as hstl
import fra_amtrak.chart_line as lne
import fra_amtrak.chart_title as ttl

## 1.0 Read files

### 1.1 Resolve paths

In [None]:
parent_path = pl.Path.cwd()  # current working directory
parent_path

### 1.2 Load constants

Load a companion [TOML](https://toml.io/en/) file named `notebook.toml` containing constants.

In [None]:
filepath = parent_path.joinpath("notebook.toml")
with open(filepath, "rb") as file_obj:
    const = tl.load(file_obj)

# Access constants
AGG = const["agg"]
CHRT_BAR = const["chart"]["bar"]
COLORS = const["colors"]
COLS = const["columns"]
DIRECTION = const["train"]["direction"]
SVC = const["services"]
SUB_SVC = const["train"]["sub_service"]
TRN = const["train"]

### 1.3 Retrieve sub service route information

The file `amtk_sub_services.json` contains miscellaneous information about Amtrak sub services (i.e., named trains).

In [None]:
filepath = parent_path.joinpath("data", "processed", "amtk_sub_services.json")
with open(filepath, "r") as file:
    amtk_sub_svcs = json.load(file)
len(amtk_sub_svcs)

### 1.4 Retrieve station details

The file `amtk_stations.csv` contains location-related information for all Amtrak stations.

In [None]:
filepath = parent_path.joinpath("data", "processed", "amtk_stations.csv")
stations = pd.read_csv(filepath, dtype={"ZIP Code": "str"}, low_memory=False)
stations.shape

### 1.5 Retrieve performance data

In [None]:
filepath = parent_path.joinpath("data", "processed", "station_performance_metrics-v1p2.csv")
trains = pd.read_csv(
    filepath, dtype={"Address 02": "str", "ZIP Code": "str"}, low_memory=False
)  # avoid DtypeWarning
trains.shape

### 1.6 Retrieve late time predictions

In [None]:
filepath = parent_path.joinpath("data", "student", "stu-amtk-avg_min_late_predict.csv")
predictions = pd.read_csv(filepath, low_memory=False)
predictions.shape

## 2.0 State Supported Michigan Service [1 pt]

Amtrak's state-supported trains are funded by state governments. These services are typically
shorter in length and operate within a single state or across multiple states. Amtrak's
[Michigan](https://www.amtrak.com/michigan-services-train) service include the _Pere Marquette_,
_Blue Water_, and _Wolverine_ trains with routes between Chicago and Grand Rapids, Chicago and Port
Huron, and Chicago and greater Detroit.

Retrieve the Michigan Service performance data by calling the appropriate `amtk_network`
function. Assign the return value of the function call to a variable named `mich`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 2.1 Michigan service: on-time performance metrics (entire period) [1 pt]

Michigan service performance data is a compilation of quarterly metrics that focus on late
detraining passengers. Detraining passengers are considered on-time if they arrive at their
destination no later than fifteen (`15`) minutes after their scheduled arrival time. All other
detraining passengers are considered late.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 2.2 Michigan service: mean late arrival times [1 pt]

Review the central tendency, dispersion, and shape for the mean late arrival times of _Wolverine_ trains.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

The skewness and kurtosis values returned suggest that the distribution of mean late arrival times of _Wolverine_ trains is positively skewed and features a sharper peak and heavier right tail than a normal distribution. Let's confirm this visually by generating a histogram.

In [None]:
#hidden tests are within this cell

### 2.3 Michigan service: visualize distribution of mean late arrival times [1 pt]

Visualize mean late arrival times for the entire period. The data is binned prior to plotting.

#### 2.3.1 Create the chart data

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

#### 2.3.2 Generate the histogram

In [None]:
# Chart title
title_txt = f"Amtrak {SVC['mich']} Service Late Detraining Passengers"
title = ttl.format_title(mich_stats, title_txt)

# Tooltips
tooltip_config = [
    {"shorthand": "bin_center:Q", "title": "Average Minutes Late", "format": None},
    {"shorthand": "count:Q", "title": "Late Arrivals Count", "format": None},
]

# Create and display the histogram
chart = hst.create_histogram(
    frame=chrt_data,
    x_shorthand="bin_center:Q",
    x_title="Average Minutes Late",
    y_shorthand="count:Q",
    y_title="Late Arrivals Count",
    y_stack=False,
    line_shorthand="Avg Min Late:Q",
    mu=mich_mu,
    sigma=mich_sigma,
    num_bins=mich_num_bins,
    bin_width=mich_bin_width,
    x_tick_count_max=mich_max_val_ceil,
    bar_color=COLORS["amtk_blue"],
    mu_color=COLORS["amtk_red"],
    sigma_color=COLORS["anth_gray"],
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

## 3.0 Michigan sub services: on-time performance metrics (entire period) [1 pt]

Call the appropriate `amtk_detrain` function and pass it the arguments required to return Michigan
service summary statistics grouped by sub service. Assign the return value of the function call to a
variable named `mich_sub_svcs_stats`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 3.1 Michigan sub services: visualize distribution of mean late arrival times

Visualize mean late arrival times for the entire period. The data is binned prior to plotting.

#### 3.1.1 Retrieve each sub service [3 pts]

Call the appropriate `amtk_network` function to retrieve the performance data for each
Michigan sub service (_Blue Water_, _Pere Marquette_, and _Wolverine_). Assign the return value of
each function call to the following variables:

1. _Blue Water_: `blwtr`
2. _Pere Marquette_: `prmrq`
3. _Wolverine_: `wolv`

In [None]:
# Assign Blue Water here
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

In [None]:
# Assign Pere Marquette here
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

In [None]:
# Assign Wolverine here
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell


#### 3.1.2 Create chart data

In [None]:
# List of sub-services and their mappings
sub_svcs = [
    {"sub_svc": SUB_SVC["blwtr"], "frame": blwtr, "color": COLORS["blue"], "order": 2},
    {"sub_svc": SUB_SVC["prmrq"], "frame": prmrq, "color": COLORS["amtk_red"], "order": 3},
    {"sub_svc": SUB_SVC["wolv"], "frame": wolv, "color": COLORS["amtk_blue"], "order": 1},
]

# Create a three-column DataFrame comprising average late times for each sub-service
mich_sub_svcs = pd.DataFrame({
    sub_svc["sub_svc"]: sub_svc["frame"][COLS["late_detrn_avg_mm_late"]]
    .dropna()
    .reset_index(drop=True)
    for sub_svc in sub_svcs
})

# print(mich_sub_svcs.head())

# Melt the DataFrame for charting purposes
chrt_data = pd.melt(
    mich_sub_svcs,
    var_name="Sub Service",
    value_name="Average Minutes Late",
)

# print(chrt_data.head())

# Histograme color and order mappings
hst_colors = {sub_svc["sub_svc"]: sub_svc["color"] for sub_svc in sub_svcs}
hst_order = {sub_svc["sub_svc"]: sub_svc["order"] for sub_svc in sub_svcs}

# Enforce the layering order
chrt_data["order"] = chrt_data[COLS["sub_svc"]].map(hst_order)
# chrt_data

#### 3.1.3 Generate histogram

In [None]:
# Chart title
title = ttl.format_title(mich_stats, f"Amtrak {SVC['mich']} Service Late Detraining Passengers")

# Tooltip configuration
tooltip_config = [
    {"shorthand": "Sub Service:N", "title": "Sub Service", "format": None},
    {"shorthand": "bin_range:N", "title": "Average Minutes Late (range)", "format": None},
    {"shorthand": "mean_late:Q", "title": "Average Minutes Late (mean)", "format": ".3f"},
    {"shorthand": "count:Q", "title": "Late Arrivals Count", "format": None},
]

chart = hstl.create_layered_histogram(
    frame=chrt_data,
    x_shorthand="bin_start:Q",
    x_title="Average Minutes Late",
    x_tick_count_max=mich_max_val_ceil,
    x2_shorthand="bin_end:Q",
    y_shorthand="count:Q",
    y_title="Late Arrivals Count",
    y_stack=False,
    line_shorthand="Avg Min Late:Q",
    mu=mich_mu,
    sigma=mich_sigma,
    max_bins=mich_num_bins,
    bin_step=5,
    hst_order_shorthand="order:O",
    hst_color_shorthand="Sub Service:N",
    hst_colors=hst_colors,
    mu_color=COLORS["amtk_red"],
    sigma_color=COLORS["anth_gray"],
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

## 4.0 Michigan _Blue Water_ service (Chicago, IL - Port Huron, MI)

The _Blue Water_ operates between [Chicago Union Station](https://www.amtrak.com/stations/chi), Chicago, IL ([CHI](https://www.amtrak.com/stations/chi))
and Port Huron, MI ([PTH](https://www.amtrak.com/stations/pth)). Intermediate stops include
Kalamazoo, MI ([KAL](https://www.amtrak.com/stations/kal)),
Battle Creek, MI ([BTL](https://www.amtrak.com/stations/btl)),
East Lansing, MI ([LNS](https://www.amtrak.com/stations/lns)),
and Flint, MI ([FLN](https://www.amtrak.com/stations/fln)), among other towns and cities.

### 4.1 _Blue Water_: on-time performance metrics (entire period) [2 pts]

_Blue Water_ performance data is a compilation of quarterly metrics that focus on late
detraining passengers. Detraining assengers are considered on-time if they arrive at their
destination no later than fifteen (`15`) minutes after their scheduled arrival time. All other
detraining passengers are considered late.

Retrieve the _Blue Water_ row from the `mich_sub_svcs_stats` DataFrame. Call the appropriate
`DataFrame` method to convert the row to a `Series`. Assign the return value to a variable named
`blwtr_stats`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

In [None]:
# Total train arrivals
blwtr_trn_arrivals = blwtr_stats["Train Arrivals"]
# blwtr_trn_arrivals

# Detraining totals
blwtr_detrn = blwtr_stats[f"{COLS['total_detrn']} sum"]
blwtr_detrn_late = blwtr_stats[f"{COLS['late_detrn']} sum"]
blwtr_detrn_on_time = blwtr_detrn_late - blwtr_detrn

print(
    f"Train Arrivals: {blwtr_trn_arrivals}",
    f"Total Detraining Customers: {blwtr_detrn}",
    f"Late Detraining Customers: {blwtr_detrn_late}",
    f"On-Time Detraining Customers: {blwtr_detrn_on_time}",
    sep="\n",
)

In [None]:
#hidden tests are within this cell

### 4.2 _Blue Water_ trains [1 pt]

Each _Blue Water_ train is identified by a unique train number.

Create a `DataFrame` named `blwtr_trns` that contains one row for each train comprising the
_Pere Marquette_ service. Include the following columns in the `DataFrame` in the order specified:

1. "Service Line"
2. "Service"
3. "Sub Service"
4. "Route Miles"
5. "Train Number"

Reset the index (set `drop=True`) when creating the new `DataFrame`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 4.3 _Blue Water_: mean late arrival times [2 pts]

In an earlier notebook, a _simple_ least-squares linear regression was formulated to estimate the
linear relationship between mean late arrival times and distance traveled (e.g., route miles). The
model suggests that with every additional route mile, the average minutes late for late detraining
passengers increases by approximately `0.0338` minutes. The _R_&sup2; value indicated that around
`25.5%` of the variability in the average minutes late could be explained by the number of route
miles traveled, with the remaining variability due to other factors or random noise.

For _Blue Water_ late detraining passengers traveling the entire route, the model predicts a mean
late arrival time of approximately `39.29` minutes.

Retrieve the _Blue Water_ row from the `predictions` DataFrame. Call the appropriate `DataFrame`
method to convert the row to a `Series`. Assign the return value to a variable named
`blwtr_predicted`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

Contrast this prediction with the actual mean late arrival times experienced by _Blue Water_ late detraining passengers.

In [None]:
# Drop missing values
blwtr_avg_mm_late = blwtr[COLS["late_detrn_avg_mm_late"]].reset_index(drop=True)

# Call the custom frm.describe_numeric_column() function
blwtr_avg_mm_late_describe = frm.describe_numeric_column(blwtr_avg_mm_late)
blwtr_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

### 4.4 _Blue Water_: eastbound and westbound routes [1 pt]

Stations served by eastbound and westbound _Blue Water_ trains.

In [None]:
# Retrieve the sub service from the Amtrak sub services list
blwtr_sub_svc = next(
    (sub_svc for sub_svc in amtk_sub_svcs if sub_svc["sub service"] == SUB_SVC["blwtr"])
)
blwtr_stn_codes = blwtr_sub_svc["station codes"]
blwtr_stns = stations[stations[COLS["station_code"]].isin(blwtr_stn_codes)].reset_index(drop=True)
blwtr_stns.sort_values(by=COLS["lon"], inplace=True)
blwtr_stns

Station order for eastbound and westbound _Blue Water_ trains.

In [None]:
blwtr_stn_order_eb = blwtr_sub_svc["station"]["eastbound"]
blwtr_stn_order_wb = blwtr_sub_svc["station"]["westbound"]
# blwtr_stn_order_eb, blwtr_stn_order_wb

In [None]:
#hidden tests are within this cell

### 4.5 _Blue Water_: eastbound detraining passengers summary statistics

#### 4.5.1 _Blue Water_ Train 364 [1 pt]

In [None]:
# Base columns for routes
rte_cols = [
    COLS["station_code"],
    COLS["station"],
    COLS["state"],
    COLS["lat"],
    COLS["lon"],
]

# Train 364 eastbound
amtk_364 = ntwk.by_train_number(trains, 364)
amtk_364_rte = ntwk.create_route(amtk_364, TRN["364"]["direction"], blwtr_stn_order_eb)
amtk_364_rte_stats = detrn.get_route_sum_stats(
    amtk_364_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_364_rte_stats

In [None]:
#hidden tests are within this cell

##### 4.5.1.1 Write to file [1 pt]

Write `amtk_364_rte_stats` to a CSV file named `stu-amtk_364_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_364_rte_stats.csv` file.
It must match line for line, character for character.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 4.6 _Blue Water_ eastbound mean late arrival times

#### 4.6.1 _Blue Water_ Train 364 [1 pt]

Review the central tendency, dispersion, and shape for the mean late arrival times of train 364.

In [None]:
# Drop missing values
amtk_364_avg_mm_late = amtk_364[COLS["late_detrn_avg_mm_late"]].reset_index(drop=True)

# Describe the column
amtk_364_avg_mm_late_describe = frm.describe_numeric_column(amtk_364_avg_mm_late)
amtk_364_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

#### 4.6.2 Generate box plots

##### 4.6.2.1 Assemble the chart data

In [None]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Chart data
chrt_data = detrn.get_qtr_avg_min_late(
    amtk_364_rte, cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
)
chrt_data

##### 4.6.2.2 Preaggregate the data

In [None]:
# Base columns for aggregation statistics
cols = [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]

# Pre-aggregate the data
chrt_data = frm.aggregate_data(chrt_data, cols)

##### 4.6.2.3 Generate chart

In [None]:
# Create chart title
txt = TRN["364"]
title_txt = (
    f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
    f"{txt['route']} ({txt['direction']})"
)
title = ttl.format_title(amtk_364_rte_stats, title_txt)

# Create and display the vertical boxplot
chart_vertical = boxp.create_boxplot(
    data=chrt_data,
    x_shorthand="Fiscal Year Quarter:N",
    x_title="Period",
    y_shorthand="Late Detraining Customers Avg Min Late:Q",
    y_title="Average Minutes Late",
    box_size=20,
    outlier_shorthand="outliers:Q",
    color_shorthand="Color:N",
    chart_title=title,
    orient=boxp.Orient.VERTICAL,
)
chart_vertical.display()

### 4.7 _Blue Water_: visualize eastbound mean late arrival times by station

In [None]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['blwtr']} Service Late Detraining Passengers (2202 Q1 - 2024 Q3)"
title = ttl.format_title(amtk_364_rte_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = amtk_364_rte_stats.index.tolist()

# Custom line colors
line_colors = {364: COLORS["amtk_blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart(
    frame=amtk_364_rte_stats,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=75,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

### 4.8 _Blue Water_: westbound detraining passengers summary statistics

#### 4.8.1 _Blue Water_ Train 365 [1 pt]

Review previous code employed to generate summary statistics for an Amtrak train. Then leverage
functions available in the `amtk_network` and `amtk_detrain` modules to create three new
`DataFrame` objects named `amtk_365`, `amtk_365_rte`, and `amtk_365_rte_stats`, respectively.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

##### 4.8.1.1 Write to file [1 pt]

Write `amtk_365_rte_stats` to a CSV file named `stu-amtk_365_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_365_rte_stats.csv` file.
It must match line for line, character for character.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 4.9 _Blue Water_: westbound mean late arrival times

#### 4.9.1 _Blue Water_ Train 365 [1 pt]

Review the central tendency, dispersion, and shape for the mean late arrival times of
train 365.

In [None]:
# Drop missing values
amtk_365_avg_mm_late = amtk_365[COLS["late_detrn_avg_mm_late"]].reset_index(drop=True)

# Describe the column
amtk_365_avg_mm_late_describe = frm.describe_numeric_column(amtk_365_avg_mm_late)
amtk_365_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

#### 4.9.2 Generate box plots

##### 4.9.2.1 Assemble the chart data

In [None]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Chart data
chrt_data = detrn.get_qtr_avg_min_late(
    amtk_365_rte, cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
)
chrt_data

##### 4.9.2.2 Preaggregate the data

In [None]:
# Base columns for aggregation statistics
cols = [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]

# Pre-aggregate the data
chrt_data = frm.aggregate_data(chrt_data, cols)

##### 4.9.2.3 Generate chart

In [None]:
# Create chart title
txt = TRN["365"]
title_txt = (
    f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
    f"{txt['route']} ({txt['direction']})"
)
title = ttl.format_title(amtk_365_rte_stats, title_txt)

# Create and display the vertical boxplot
chart_vertical = boxp.create_boxplot(
    data=chrt_data,
    x_shorthand="Fiscal Year Quarter:N",
    x_title="Period",
    y_shorthand="Late Detraining Customers Avg Min Late:Q",
    y_title="Average Minutes Late",
    box_size=20,
    outlier_shorthand="outliers:Q",
    color_shorthand="Color:N",
    chart_title=title,
    orient=boxp.Orient.VERTICAL,
)
chart_vertical.display()

### 4.10 _Blue Water_: visualize westbound mean late arrival times by station

In [None]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['blwtr']} Service Late Detraining Passengers (2022 Q1 - 2024 Q3)"
title = ttl.format_title(amtk_365_rte_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = amtk_365_rte_stats.index.tolist()

# Custom line colors
line_colors = {365: COLORS["amtk_blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart(
    frame=amtk_365_rte_stats,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=75,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

## 5.0 Michigan _Pere Marquette_ service (Chicago, IL - Grand Rapids, MI)

The _Pere Marquette_ operates daily between
[Chicago Union Station](https://www.amtrak.com/stations/chi), Chicago, IL
([CHI](https://www.amtrak.com/stations/chi)) and 
Grand Rapids ([GRR](https://www.amtrak.com/stations/grr)), MI. Intermediate stops include
St. Joseph-Benton Harbor, MI ([SJM](https://www.amtrak.com/stations/sjm.html)),
Bangor, MI ([BAM](https://www.amtrak.com/stations/bam.html)), and
Holland, MI ([HOM](https://www.amtrak.com/stations/hom.html)).

### 5.1 _Pere Marquette_: on-time performance metrics (entire period) [2 pts]

_Pere Marquette_ performance data is a compilation of quarterly metrics that focus on late
detraining passengers. Detraining assengers are considered on-time if they arrive at their
destination no later than fifteen (`15`) minutes after their scheduled arrival time. All other
detraining passengers are considered late.

Retrieve the _Pere Marquette_ row from the `mich_sub_svcs_stats` DataFrame. Call the appropriate
`DataFrame` method to convert the row to a `Series`. Assign the return value to a variable named
`prmrq_stats`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

In [None]:
# Total train arrivals
prmrq_trn_arrivals = prmrq_stats["Train Arrivals"]
# prmrq_trn_arrivals

# Detraining totals
prmrq_detrn = prmrq_stats[f"{COLS['total_detrn']} sum"]
prmrq_detrn_late = prmrq_stats[f"{COLS['late_detrn']} sum"]
prmrq_detrn_on_time = prmrq_detrn - prmrq_detrn_late

print(
    f"Train Arrivals: {prmrq_trn_arrivals}",
    f"Total Detraining Customers: {prmrq_detrn}",
    f"Late Detraining Customers: {prmrq_detrn_late}",
    f"On-Time Detraining Customers: {prmrq_detrn_on_time}",
    sep="\n",
)

In [None]:
#hidden tests are within this cell

### 5.2 _Pere Marquette_ trains [1 pt]

Each _Pere Marquette_ train is identified by a unique train number.

Create a `DataFrame` named `prmrq_trns` that contains one row for each train comprising the
_Pere Marquette_ service. Include the following columns in the `DataFrame` in the order specified:

1. "Service Line"
2. "Service"
3. "Sub Service"
4. "Route Miles"
5. "Train Number"

Reset the index (set `drop=True`) when creating the new `DataFrame`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 5.3 _Pere Marquette_: mean late arrival times [2 pts]

In an earlier notebook, a _simple_ least-squares linear regression was formulated to estimate the
linear relationship between mean late arrival times and distance traveled (e.g., route miles). The
model suggests that with every additional route mile, the average minutes late for late detraining
passengers increases by approximately `0.0338` minutes. The _R_&sup2; value indicated that around
`25.5%` of the variability in the average minutes late could be explained by the number of route
miles traveled, with the remaining variability due to other factors or random noise.

For _Pere Marquette_ late detraining passengers traveling the entire route, the model predicts a mean
late arrival time of approximately `34.38`` minutes.

Retrieve the _Pere Marquette_ row from the `predictions` DataFrame. Call the appropriate `DataFrame`
method to convert the row to a `Series`. Assign the return value to a variable named
`prmrq_predicted`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

Contrast this prediction with the actual mean late arrival times experienced by _Pere Marquette_ late detraining passengers.

In [None]:
# Drop missing values
prmrq_avg_mm_late = prmrq[COLS["late_detrn_avg_mm_late"]].reset_index(drop=True)

# Call the custom frm.describe_numeric_column() function
prmrq_avg_mm_late_describe = frm.describe_numeric_column(prmrq_avg_mm_late)
prmrq_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

### 5.4 _Pere Marquette_: eastbound and westbound routes [1 pt]

Stations served by eastbound and westbound _Pere Marquette_ trains.

In [None]:
# Retrieve the sub service from the Amtrak sub services list
prmrq_sub_svc = next(
    (sub_svc for sub_svc in amtk_sub_svcs if sub_svc["sub service"] == SUB_SVC["blwtr"])
)
prmrq_stn_codes = prmrq_sub_svc["station codes"]
prmrq_stns = stations[stations[COLS["station_code"]].isin(prmrq_stn_codes)].reset_index(drop=True)
prmrq_stns.sort_values(by=COLS["lon"], inplace=True)
prmrq_stns

In [None]:
prmrq_stn_order_eb = prmrq_sub_svc["station order"]["eastbnd"]
prmrq_stn_order_wb = prmrq_sub_svc["station order"]["westbnd"]
# prmrq_stn_order_eb, prmrq_stn_order_wb

In [None]:
#hidden tests are within this cell

### 5.5 _Pere Marquette_: eastbound detraining passengers summary statistics

#### 5.5.1 _Pere Marquette_ Train 370 [1 pt]

In [None]:
# Base columns for routes
rte_cols = [
    COLS["trn"],
    COLS["station"],
    COLS["station_code"],
    COLS["state"],
    COLS["lat"],
    COLS["lon"],
]

# Train 370 eastbound
amtk_370 = ntwk.by_train_number(trains, 370)
amtk_370_rte = ntwk.create_route(amtk_370, TRN["370"]["direction"], prmrq_stn_order_eb)
amtk_370_rte_stats = detrn.get_route_sum_stats(
    amtk_370_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_370_rte_stats

In [None]:
#hidden tests are within this cell

##### 5.5.1.1 Write to file [1 pt]

Write `amtk_370_rte_stats` to a CSV file named `stu-amtk_370_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_370_rte_stats.csv` file.
It must match line for line, character for character.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 5.6 _Pere Marquette_: eastbound mean late arrival times

#### 5.6.1 _Pere Marquette_ Train 370 [1 pt]

Review the central tendency, dispersion, and shape for the mean late arrival times of train 370.

In [None]:
# Drop missing values
amtk_370_avg_mm_late = amtk_370[COLS["late_detrn_avg_mm_late"]].reset_index(drop=True)

# Describe the column
amtk_370_avg_mm_late_describe = frm.describe_numeric_column(amtk_370_avg_mm_late)
amtk_370_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

#### 5.6.2 Generate box plots

##### 5.6.2.1 Assemble the chart data

In [None]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Chart data
chrt_data = detrn.get_qtr_avg_min_late(
    amtk_370_rte, cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
)
# chrt_data

##### 5.6.2.2 Preaggregate the data

In [None]:
# Base columns for aggregation statistics
cols = [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]

# Pre-aggregate the data
chrt_data = frm.aggregate_data(chrt_data, cols)

##### 5.6.2.3 Generate chart

In [None]:
# Create chart title
txt = TRN["370"]
title_txt = (
    f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
    f"{txt['route']} ({txt['direction']})"
)
title = ttl.format_title(amtk_370_rte_stats, title_txt)

# Create and display the vertical boxplot
chart_vertical = boxp.create_boxplot(
    data=chrt_data,
    x_shorthand="Fiscal Year Quarter:N",
    x_title="Period",
    y_shorthand="Late Detraining Customers Avg Min Late:Q",
    y_title="Average Minutes Late",
    box_size=20,
    outlier_shorthand="outliers:Q",
    color_shorthand="Color:N",
    chart_title=title,
    orient=boxp.Orient.VERTICAL,
)
chart_vertical.display()

### 5.7 _Pere Marquette_: visualize eastbound mean late arrival times by station

In [None]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['prmrq']} Service Late Detraining Passengers (2022 Q1 - 2024 Q3)"
title = ttl.format_title(amtk_370_rte_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = amtk_370_rte_stats.index.tolist()

# Custom line colors
line_colors = {370: COLORS["amtk_blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart(
    frame=amtk_370_rte_stats,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=75,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

### 5.8 _Pere Marquette_: westbound detraining passengers summary statistics

#### 5.8.1 _Pere Marquette_ Train 371 [1 pt]

Review previous code employed to generate summary statistics for an Amtrak train. Then leverage
functions available in the `amtk_network` and `amtk_detrain` modules to create three new
`DataFrame` objects named `amtk_371`, `amtk_371_rte`, and `amtk_371_rte_stats`, respectively.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

##### 5.8.1.1 Write to file [1 pt]

Write `amtk_371_rte_stats` to a CSV file named `stu-amtk_371_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_371_rte_stats.csv` file.
It must match line for line, character for character.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 5.9 _Pere Marquette_: westbound mean late arrival times

#### 5.9.1 _Pere Marquette_ Train 371 [1 pt]

Review the central tendency, dispersion, and shape for the mean late arrival times of train 371.

In [None]:
# Drop missing values
amtk_371_avg_mm_late = amtk_371[COLS["late_detrn_avg_mm_late"]].reset_index(drop=True)

# Describe the column
amtk_371_avg_mm_late_describe = frm.describe_numeric_column(amtk_371_avg_mm_late)
amtk_371_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

#### 5.9.2 Generate box plots

##### 5.9.2.1 Assemble the chart data

In [None]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Chart data
chrt_data = detrn.get_qtr_avg_min_late(
    amtk_371_rte, cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
)
# chrt_data

##### 5.9.2.2 Preaggregate the data

In [None]:
# Base columns for aggregation statistics
cols = [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]

# Pre-aggregate the data
chrt_data = frm.aggregate_data(chrt_data, cols)

##### 5.9.2.3 Generate chart

In [None]:
# Create chart title
txt = TRN["371"]
title_txt = (
    f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
    f"{txt['route']} ({txt['direction']})"
)
title = ttl.format_title(amtk_371_rte_stats, title_txt)

# Create and display the vertical boxplot
chart_vertical = boxp.create_boxplot(
    data=chrt_data,
    x_shorthand="Fiscal Year Quarter:N",
    x_title="Period",
    y_shorthand="Late Detraining Customers Avg Min Late:Q",
    y_title="Average Minutes Late",
    box_size=20,
    outlier_shorthand="outliers:Q",
    color_shorthand="Color:N",
    chart_title=title,
    orient=boxp.Orient.VERTICAL,
)
chart_vertical.display()

### 5.10 _Pere Marquette_: visualize westbound mean late arrival times by station

In [None]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['prmrq']} Service Late Detraining Passengers (2022 Q1 - 2024 Q3)"
title = ttl.format_title(amtk_371_rte_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = amtk_371_rte_stats.index.tolist()

# Custom line colors
line_colors = {371: COLORS["amtk_blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart(
    frame=amtk_371_rte_stats,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=75,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

## 6.0 Michigan _Wolverine_ service (Chicago, IL - Pontiac, MI)

_Wolverine_ trains operates between
[Chicago Union Station](https://www.amtrak.com/stations/chi),
Chicago, IL ([CHI](https://www.amtrak.com/stations/chi)) and
Pontiac, MI ([PNT](https://www.amtrak.com/stations/pnt)). Intermediate stops include
Kalamazoo, MI ([KAL](https://www.amtrak.com/stations/kal)),
Battle Creek, MI ([BTL](https://www.amtrak.com/stations/btl)),
Jackson, MI ([JXN](https://www.amtrak.com/stations/jxn)),
Ann Arbor, MI ([ARB](https://www.amtrak.com/stations/arb)), and
Detroit, MI ([DET](https://www.amtrak.com/stations/det)), among other towns and cities.

### 6.1 _Wolverine_: on-time performance metrics (entire period) [2 pts]

_Wolverine_ performance data is a compilation of quarterly metrics that focus on late
detraining passengers. Detraining assengers are considered on-time if they arrive at their
destination no later than fifteen (`15`) minutes after their scheduled arrival time. All other
detraining passengers are considered late.

Retrieve the _Wolverine_ row from the `mich_sub_svcs_stats` DataFrame. Call the appropriate
`DataFrame` method to convert the row to a `Series`. Assign the return value to a variable named
`wolv_stats`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

In [None]:
# Total train arrivals
wolv_trn_arrivals = wolv_stats["Train Arrivals"]
# wolv_trn_arrivals

# Detraining totals
wolv_detrn = wolv_stats[f"{COLS['total_detrn']} sum"]
wolv_detrn_late = wolv_stats[f"{COLS['late_detrn']}"]
wolv_detrn_on_time = wolv_detrn - wolv_detrn_late

print(
    f"Train Arrivals: {wolv_trn_arrivals}",
    f"Total Detraining Customers: {wolv_detrn}",
    f"Late Detraining Customers: {wolv_detrn_late}",
    f"On-Time Detraining Customers: {wolv_detrn_on_time}",
    sep="\n",
)

In [None]:
#hidden tests are within this cell

### 6.2 _Wolverine_ trains [1 pt]

Each _Wolverine_ train is identified by a unique train number.

Create a `DataFrame` named `wolv_trns` that contains one row for each train comprising the
_Pere Marquette_ service. Include the following columns in the `DataFrame` in the order specified:

1. "Service Line"
2. "Service"
3. "Sub Service"
4. "Route Miles"
5. "Train Number"

Reset the index (set `drop=True`) when creating the new `DataFrame`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 6.3 _Wolverine_: mean late arrival times [2 pts]

In an earlier notebook, a _simple_ least-squares linear regression was formulated to estimate the
linear relationship between mean late arrival times and distance traveled (e.g., route miles). The
model suggests that with every additional route mile, the average minutes late for late detraining
passengers increases by approximately `0.0338` minutes. The _R_&sup2; value indicated that around
`25.5%` of the variability in the average minutes late could be explained by the number of route
miles traveled, with the remaining variability due to other factors or random noise.

For _Wolverine_ late detraining passengers traveling the entire route, the model predicts a mean
late arrival time of approximately `38.61` minutes.

Retrieve the _Wolverine_ row from the `predictions` DataFrame. Call the appropriate `DataFrame`
method to convert the row to a `Series`. Assign the return value to a variable named
`wolv_predicted`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

Contrast this prediction with the actual mean late arrival times experienced by _Wolverine_ late detraining passengers.

In [None]:
# Drop missing values
wolv_avg_mm_late = wolv[COLS["avg_mm_late"]].dropna().reset_index(drop=True)

# Call the custom frm.describe_numeric_column() function again
wolv_avg_mm_late_describe = frm.describe_numeric_column(wolv_avg_mm_late)
wolv_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

### 6.4 _Wolverine_: eastbound and westbound routes [1 pt]

Stations served by eastbound and westbound _Wolverine_ trains.

In [None]:
# Retrieve the sub service from the Amtrak sub services list
wolv_sub_svc = next(
    (sub_svc for sub_svc in amtk_sub_svcs if sub_svc["sub service"] == SUB_SVC["wolv"])
)
wolv_stn_codes = wolv_sub_svc["station codes"]
wolv_stns = stations[stations[COLS["station_code"]].isin(wolv_stn_codes)].reset_index(drop=True)
# WARN: longitude sort does not guarantee correct station order: ROY, TRM, PNT last/first 3 stops
wolv_stns.sort_values(by=COLS["lon"], inplace=True)
wolv_stns

Station order for eastbound and westbound _Pere Marquette_ trains.

In [None]:
# Correct order of stations for eastbound and westbound trains
wolv_stn_order_eb = wolv_sub_svc["station order"]["eb"]
wolv_stn_order_wb = wolv_sub_svc["station order"]["wb"]
# wolv_stn_order_eb, wolv_stn_order_wb

In [None]:
#hidden tests are within this cell

### 6.5 _Wolverine_: eastbound detraining passengers summary statistics [2 pts]

Call the appropriate `DataFrame` method to retrieve the station data for trains 350, 352, and 354
referenced in `wolv`. Assign the new `DataFrame` returned by the method call to a variable named
`wolv_eb`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

Compute the summary statistics for westbound _Wolverine_ trains. The `amtk_detrain` module includes
a function that you can call to perform the computation. Pass to it `wolv_eb` along with other
arguments required by the function. Assign the return value of the function call to a variable named
`wolv_eb_stats`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

#### 6.5.1 _Wolverine_ Train 350 [1 pt]

In [None]:
rte_cols = [
    COLS["station_code"],
    COLS["station"],
    COLS["state"],
    COLS["lat"],
    COLS["lon"],
]

# Train 350 eastbound
amtk_350 = ntwk.by_train_number(trains, 350)
amtk_350_rte = ntwk.create_route(amtk_350, TRN["350"]["direction"], wolv_stn_order_eb)
amtk_350_rte_stats = detrn.get_route_sum_stats(
    amtk_350_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_350_rte_stats

In [None]:
#hidden tests are within this cell

##### 6.5.1.1 Write to file [1 pt]

Write `amtk_350_rte_stats` to a CSV file named `stu-amtk_350_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_350_rte_stats.csv` file.
It must match line for line, character for character.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

#### 6.5.2 _Wolverine_ Train 352 [1 pt]

In [None]:
# Train 352 eastbound
amtk_352 = ntwk.by_train_number(wolv, 350)
amtk_352_rte = ntwk.create_route(amtk_350, TRN["350"]["direction"], wolv_stn_order_eb)
amtk_352_rte_stats = detrn.get_route_sum_stats(
    amtk_352_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_352_rte_stats

In [None]:
#hidden tests are within this cell

##### 6.5.2.1 Write to file [1 pt]

Write `amtk_352_rte_stats` to a CSV file named `stu-amtk_352_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_352_rte_stats.csv` file.
It must match line for line, character for character.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

#### 6.5.3 _Wolverine_ Train 354 [1 pt]

In [None]:
# Train 354 eastbound
amtk_354 = ntwk.by_trn_number(wolv, 354)
amtk_354_rte = ntwk.create_route(amtk_354, TRN["354"]["direction"], wolv_stn_order_eb)
amtk_354_rte_stats = detrn.get_route_sum_stats(
    amtk_354_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_354_rte_stats

In [None]:
#hidden tests are within this cell

##### 6.5.3.1 Write to file [1 pt]

Write `amtk_354_rte_stats` to a CSV file named `stu-amtk_354_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_354_rte_stats.csv` file.
It must match line for line, character for character.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 6.6 _Wolverine_: eastbound mean late arrival times

Review the central tendency, dispersion, and shape for the mean late arrival times of trains 350, 352, and 354.

#### 6.6.1 _Wolverine_ Train 350 [1 pt]

In [None]:
# Drop missing values
amtk_350_avg_mm_late = amtk_350[COLS["late_detrn_avg_mm_late"]].reset_index(drop=True)

# Describe the column
amtk_350_avg_mm_late_describe = frm.describe_numeric_column(amtk_350_avg_mm_late)
amtk_350_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

#### 6.6.2 _Wolverine_ Train 352 [1 pt]

In [None]:
# Drop missing values
amtk_352_avg_mm_late = amtk_352[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_352_avg_mm_late_describe = frm.describe_numeric_column(amtk_352_avg_mm_late)
amtk_352_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

#### 6.6.3 _Wolverine_ Train 354 [1 pt]

In [None]:
# Drop missing values
amtk_354_avg_mm_late = amtk_354[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_354_avg_mm_late_describe = frm.describe_numeric_column(amtk_352_avg_mm_late)
amtk_354_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

#### 6.6.4 Generate box plots

Retrieve the chart data, preaggregate it, and generate the box plots.

In [None]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Data for boxplots
wolv_eb_trns = [
    {"number": 350, "route": amtk_350_rte, "stats": amtk_350_rte_stats},
    {"number": 352, "route": amtk_352_rte, "stats": amtk_352_rte_stats},
    {"number": 354, "route": amtk_354_rte, "stats": amtk_354_rte_stats},
]

# Assemble charts
for trn in wolv_eb_trns:
    chrt_data = detrn.get_qtr_avg_min_late(
        trn["route"], cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
    )

    # Pre-aggregate the data
    chrt_data = frm.aggregate_data(
        chrt_data, [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]
    )

    # Create chart title
    txt = TRN[str(trn["number"])]
    title_txt = (
        f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
        f"{txt['route']} ({txt['direction']})"
    )
    title = ttl.format_title(trn["stats"], title_txt)

    # Create and display the vertical boxplot
    chart_vertical = boxp.create_boxplot(
        data=chrt_data,
        x_shorthand="Fiscal Year Quarter:N",
        x_title="Period",
        y_shorthand="Late Detraining Customers Avg Min Late:Q",
        y_title="Average Minutes Late",
        box_size=20,
        outlier_shorthand="outliers:Q",
        color_shorthand="Color:N",
        chart_title=title,
        orient=boxp.Orient.VERTICAL,
    )
    chart_vertical.display()

### 6.7 _Wolverine_: visualize eastbound mean late arrival times by station

Visualizing mean late arrival times by station for each train in a single chart requires aligning
the station order for trains 350, 352, and 354. This requires adding "placeholder" rows to each
train `DataFrame` that represent stations not visited by all trains.

| Train | "Placeholder" Station(s) | Notes |
| :---- | :----------------------- | :---- |
| 350   | Albion, MI ([ALI](https://www.amtrak.com/stations/ali)) | &nbsp; |
| 352   | Albion, MI ([ALI](https://www.amtrak.com/stations/ali)), Dowagiac, MI ([DOA](https://www.amtrak.com/stations/doa)), Hammond-Whiting, IN ([HMI](https://www.amtrak.com/stations/hmi)), Michigan City, IN ([MCI](https://en.wikipedia.org/wiki/Michigan_City_station)) | MCI closed 4 April 2022 |
| 354   | Dowagiac, MI ([DOA](https://www.amtrak.com/stations/doa)), Hammond-Whiting, IN ([HMI](https://www.amtrak.com/stations/hmi)) | &nbsp; |

#### 6.7.1 Reindex `wolv_stns`

In [None]:
# Reindex DataFrame based on amtk_350_rte_stats columns.
wolv_stns = wolv_stns.reindex(columns=amtk_350_rte_stats.columns).reset_index(drop=True)
# wolv_stns

#### 6.7.2 Assemble chart data [1 pt]

Combine the three `amtk_*_chrt_data` `DataFrame` objects created below by calling the function
`ntwk.add_stations_to_route()` into a single `DataFrame` object named `chrt_data`. When combining
the `DataFrame` objects ignore their current indices.

In [None]:
amtk_350_chrt_data = ntwk.add_stations_to_route(
    amtk_350_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]] == "ALI"],
    wolv_stn_order_eb,
)
# amtk_350_chrt_data

In [None]:
amtk_352_chrt_data = ntwk.add_stations_to_route(
    amtk_352_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]].isin(("ALI", "DOA", "HMI", "MCI"))],
    wolv_stn_order_eb,
)
# amtk_352_chrt_data

In [None]:
amtk_354_chrt_data = ntwk.add_stations_to_route(
    amtk_354_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]].isin(("DOA", "HMI"))],
    wolv_stn_order_eb,
)
# amtk_354_chrt_data

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

#### 6.7.3 Generate line chart

In [None]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['wolv']} Service Late Detraining Passengers"
title = ttl.format_title(wolv_eb_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = chrt_data.index.tolist()

# Custom line colors
line_colors = {350: COLORS["amtk_blue"], 352: COLORS["amtk_red"], 354: COLORS["blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart_interp(
    frame=chrt_data,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=85,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

### 6.8 _Wolverine_: westbound detraining passengers summary statistics [2 pts]

Call the appropriate `DataFrame` method to retrieve the station data for trains 351, 353, and 355
referenced in `wolv`. Assign the new `DataFrame` returned by the method call to a variable named
`wolv_wb`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

Compute the summary statistics for westbound _Wolverine_ trains. The `amtk_detrain` module includes
a function that you can call to perform the computation. Pass to it `wolv_wb` along with other
arguments required by the function. Assign the return value of the function call to a variable named
`wolv_wb_stats`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

#### 6.8.1 _Wolverine_ Train 351 [1 pt]

In [None]:
# Train 351 westbound
amtk_351 = ntwk.by_train_number(wolv, 351)
amtk_351_rte = ntwk.create_route(amtk_351, TRN["351"]["direction"], wolv_stn_order_eb)
amtk_351_rte_stats = detrn.get_route_sum_stats(
    amtk_351_rte, COLS["station_code"], AGG["columns"], AGG["funcs"]
)
amtk_351_rte_stats

In [None]:
#hidden tests are within this cell

##### 6.8.1.1 Write to file [1 pt]

Write `amtk_351_rte_stats` to a CSV file named `stu-amtk_351_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_351_rte_stats.csv` file.
It must match line for line, character for character.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

#### 6.8.2 _Wolverine_ Train 353 [1 pt]

In [None]:
# Train 353 westbound
amtk_353 = ntwk.by_train_number(wolv, 353)[COLS["trn"]]
amtk_353_rte = ntwk.create_route(amtk_353, TRN["353"]["direction"], wolv_stn_order_wb)
amtk_353_rte_stats = detrn.get_route_sum_stats(
    amtk_353_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_353_rte_stats

In [None]:
#hidden tests are within this cell

##### 6.8.2.1 Write to file [1 pt]

Write `amtk_353_rte_stats` to a CSV file named `stu-amtk_353_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_353_rte_stats.csv` file.
It must match line for line, character for character.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

#### 6.8.3 _Wolverine_ Train 355 [1 pt]

In [None]:
# Train 355 westbound
amtk_355 = ntwk.by_train_number(wolv, 355)
amtk_355_rte = ntwk.create_route(amtk_355, TRN["355"]["direction"], wolv_stn_order_eb)
amtk_355_rte_stats = detrn.get_route_sum_stats(
    amtk_355_rte, COLS["station_code"], AGG["columns"], AGG["funcs"]
)
amtk_355_rte_stats

In [None]:
#hidden tests are within this cell

##### 6.8.3.1 Write to file [1 pt]

Write `amtk_355_rte_stats` to a CSV file named `stu-amtk_355_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_355_rte_stats.csv` file.
It must match line for line, character for character.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

### 6.9 _Wolverine_: westbound mean late arrival times

Review the central tendency, dispersion, and shape for the mean late arrival times of trains 351, 353, and 355.

#### 6.9.1 _Wolverine_ Train 351 [1 pt]

In [None]:
# Drop missing values
amtk_351_avg_mm_late = amtk_351[COLS["late_detrn_avg_mm_late"]].reset_index(drop=True)

# Describe the column
amtk_351_avg_mm_late_describe = frm.describe_numeric_column(amtk_351_avg_mm_late)
amtk_351_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

#### 6.9.2 _Wolverine_ Train 353 [1 pt]

In [None]:
# Drop missing values
amtk_353_avg_mm_late = amtk_353[COLS["late_detrn_avg_mm_late"]].reset_index(drop=True)

# Describe the column
amtk_353_avg_mm_late_describe = frm.describe_numeric_column(amtk_353_avg_mm_late)
amtk_353_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

#### 6.9.3 _Wolverine_ Train 355 [1 pt]

In [None]:
# Drop missing values
amtk_355_avg_mm_late = amtk_355[COLS["late_detrn_avg_mm_late"]].reset_index(drop=True)

# Describe the column
amtk_355_avg_mm_late_describe = frm.describe_numeric_column(amtk_355_avg_mm_late)
amtk_355_avg_mm_late_describe

In [None]:
#hidden tests are within this cell

#### 6.9.4 Generate box plots

Retrieve the chart data, preaggregate it, and generate the box plots.

In [None]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Data for boxplots
wolv_wb_trns = [
    {"number": 351, "route": amtk_351_rte, "stats": amtk_351_rte_stats},
    {"number": 353, "route": amtk_353_rte, "stats": amtk_353_rte_stats},
    {"number": 355, "route": amtk_355_rte, "stats": amtk_355_rte_stats},
]

# Assemble charts
for trn in wolv_wb_trns:
    chrt_data = detrn.get_qtr_avg_min_late(
        trn["route"], cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
    )

    # Pre-aggregate the data
    chrt_data = frm.aggregate_data(
        chrt_data, [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]
    )

    # Create chart title
    txt = TRN[str(trn["number"])]
    title_txt = (
        f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
        f"{txt['route']} ({txt['direction']})"
    )
    title = ttl.format_title(trn["stats"], title_txt)

    # Create and display the vertical boxplot
    chart_vertical = boxp.create_boxplot(
        data=chrt_data,
        x_shorthand="Fiscal Year Quarter:N",
        x_title="Period",
        y_shorthand="Late Detraining Customers Avg Min Late:Q",
        y_title="Average Minutes Late",
        box_size=20,
        outlier_shorthand="outliers:Q",
        color_shorthand="Color:N",
        chart_title=title,
        orient=boxp.Orient.VERTICAL,
    )
    chart_vertical.display()


### 6.10 _Wolverine_: visualize westbound mean late arrival times by station

Visualizing mean late arrival times by station for each train in a single chart requires aligning
the station order for trains 351, 353, and 355. This requires adding "placeholder" rows to each
train `DataFrame` that represent stations not visited by all trains.

| Train | "Placeholder" Station(s) | Notes |
| :---- | :----------------------- | :---- |
| 351   | Dowagiac, MI ([DOA](https://www.amtrak.com/stations/doa)), Hammond-Whiting, IN ([HMI](https://www.amtrak.com/stations/hmi)), Michigan City, IN ([MCI](https://en.wikipedia.org/wiki/Michigan_City_station)), New Buffalo, MI ([NBU](https://www.amtrak.com/stations/nbu)), Niles, MI ([NLS](https://www.amtrak.com/stations/nls)) | MCI closed 4 April 2022 |
| 353   | Albion, MI ([ALI](https://www.amtrak.com/stations/ali)), Dowagiac, MI ([DOA](https://www.amtrak.com/stations/doa)), Michigan City, IN ([MCI](https://en.wikipedia.org/wiki/Michigan_City_station)) | MCI closed 4 April 2022 |
| 355   | Albion, MI ([ALI](https://www.amtrak.com/stations/ali)) | &nbsp; |

#### 6.10.1 Assemble chart data [1 pt]

Combine the three `amtk_*_chrt_data` `DataFrame` objects created below by calling the function
`ntwk.add_stations_to_route()` into a single `DataFrame` object named `chrt_data`. When combining
the `DataFrame` objects ignore their current indices.

In [None]:
amtk_351_chrt_data = ntwk.add_stations_to_route(
    amtk_351_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]].isin(("DOA", "HMI", "MCI", "NBU", "NLS"))],
    wolv_stn_order_wb,
)
# amtk_351_chrt_data

In [None]:
amtk_353_chrt_data = ntwk.add_stations_to_route(
    amtk_353_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]].isin(("ALI", "DOA", "MCI"))],
    wolv_stn_order_wb,
)
# amtk_353_chrt_data

In [None]:
# Add ALI to Train 355 route
amtk_355_chrt_data = ntwk.add_stations_to_route(
    amtk_355_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]] == "ALI"],
    wolv_stn_order_wb,
)
# amtk_355_chrt_data

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
#hidden tests are within this cell

#### 6.10.2 Generate line chart

In [None]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['wolv']} Service Late Detraining Passengers"
title = ttl.format_title(wolv_wb_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = chrt_data.index.tolist()

# Custom line colors
line_colors = {351: COLORS["amtk_blue"], 353: COLORS["amtk_red"], 355: COLORS["blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart_interp(
    frame=chrt_data,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=85,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

## 5.0 Watermark

In [None]:
%load_ext watermark
%watermark -h -i -iv -m -v