# Metrolink

## Summary
`trips` can fix the missing `shape_id` values for Metrolink because `route_id`, `direction_id` are present. [See gtfs_utils_v2 here](https://github.com/cal-itp/data-analyses/blob/main/_shared_utils/shared_utils/gtfs_utils_v2.py#L202-L232)

`shapes` wouldn't have that fix, and `shape_array_key is None` in the table.

Either way, Metrolink is dropped through `trips` directly or thorugh a join between `trip` and `shapes`.

In [1]:
import os
os.environ["CALITP_BQ_MAX_BYTES"] = str(200_000_000_000)

import geopandas as gpd
import pandas as pd

from calitp.tables import tbls
from siuba import *

GCS_FILE_PATH = ("gs://calitp-analytics-data/data-analyses/"
                 "rt_delay/compiled_cached_views/"
                )

analysis_date = "2023-01-18"



In [2]:
trips = pd.read_parquet(
    f"{GCS_FILE_PATH}trips_{analysis_date}.parquet", 
    filters = [[("name", "==", "Metrolink Schedule")]], 
    columns = ["feed_key", "name", 
               "trip_id", "direction_id",
               "shape_id", "shape_array_key",
               "route_id"]
)

In [3]:
trips.head()

Unnamed: 0,feed_key,name,trip_id,direction_id,shape_id,shape_array_key,route_id
0,90e78003416c5b09f77a9de8f266c2be,Metrolink Schedule,233003800,0.0,SBout,,San Bernardino Line
1,90e78003416c5b09f77a9de8f266c2be,Metrolink Schedule,233003812,0.0,SBout,,San Bernardino Line
2,90e78003416c5b09f77a9de8f266c2be,Metrolink Schedule,200090123,0.0,VTout,,Ventura County Line
3,90e78003416c5b09f77a9de8f266c2be,Metrolink Schedule,233003821,1.0,SBin,,San Bernardino Line
4,90e78003416c5b09f77a9de8f266c2be,Metrolink Schedule,294100120,1.0,VTin,,Ventura County Line


In [4]:
metrolink_feed = trips.feed_key.iloc[0]

In [5]:
'''
metrolink_shapes = (
        tbls.mart_gtfs.fct_daily_scheduled_shapes()
        >> filter(_.service_date == analysis_date, 
                  _.feed_key == metrolink_feed
                 )
        >> select(_.feed_key, _.shape_id, 
                  _.shape_array_key, _.pt_array)
    )
'''

'\nmetrolink_shapes = (\n        tbls.mart_gtfs.fct_daily_scheduled_shapes()\n        >> filter(_.service_date == analysis_date, \n                  _.feed_key == metrolink_feed\n                 )\n        >> select(_.feed_key, _.shape_id, \n                  _.shape_array_key, _.pt_array)\n    )\n'

In [6]:
shapes = gpd.read_parquet(
    f"{GCS_FILE_PATH}routelines_{analysis_date}.parquet",
    filters = [[("feed_key", "==", metrolink_feed)]]
)

shapes

Unnamed: 0,feed_key,shape_id,shape_array_key,n_trips,geometry
