<div style="text-align: center;">
  <img src="../Images/UTA_logo.png" alt="UTA LOGO" height="133" width="500">
</div>

# Project 1: Fixed Route Traffic Stop Predictor
***

##### Phase 1: Data Cleaning and Wrangling


In this notebook, we will focus primarily on cleaning the data available from the _UTA Open Data_ Database.  We will ensure to the best of our ability that the data is:
- Complete
- Consistant
- & Accurate (where reasonable possible to do so)


This project series is to function as a proof of concept pursuant to our desired outcome: a _Dynamic Routing Generative AI_ for use largely for the benefit of the planning department but also could be used for quick rerouting and predicting better detours during construction periods.


In [1]:
# Import Modules
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Extra modules as needed


In [2]:
# Import Data .csv files into Pandas DataFrames
bus_stop_ridership_df = pd.read_csv("../Data_RAW/bus_stop_ridership_table.csv", low_memory=False)
uta_mode_level_boardings_weekday_averages_df = pd.read_csv("../Data_RAW/uta_mode_level_boardings_weekday_averages.csv")
uta_route_level_ridership_monthly_counts_df = pd.read_csv("../Data_RAW/uta_route_level_ridership_table__monthly_counts.csv")
uta_routes_and_most_recent_ridership_df = pd.read_csv("../Data_RAW/uta_routes_and_most_recent_ridership.csv")

In [3]:
bus_stop_ridership_df.head()

Unnamed: 0,objectid,servicetype,month_,year_,stopabbr,stopname,city,county,avgboardings,avgalight,routes
0,1,SAT,January,2020.0,135009,Constitution Blvd @ 3662 S,West Valley City,Salt Lake,0.0,0.269040524,39227.0
1,2,WKD,January,2020.0,101088,Gentile St @ 357 E (Layton),Layton,Davis,13.53700138,11.23065101,628470.0
2,3,WKD,January,2020.0,629172,Harrison Blvd @ 4605 S (Ogden),Ogden,Weber,2.104692794,9.510210858,
3,4,SUN,January,2020.0,135009,Constitution Blvd @ 3662 S,West Valley City,Salt Lake,0.0,0.0,39227.0
4,5,WKD,January,2020.0,135009,Constitution Blvd @ 3662 S,West Valley City,Salt Lake,1.195848178,1.003590091,39227.0


In [4]:
bus_stop_ridership_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 634393 entries, 0 to 634392
Data columns (total 11 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   objectid      634393 non-null  int64  
 1   servicetype   634391 non-null  object 
 2   month_        634391 non-null  object 
 3   year_         634391 non-null  float64
 4   stopabbr      634391 non-null  object 
 5   stopname      634391 non-null  object 
 6   city          634248 non-null  object 
 7   county        634379 non-null  object 
 8   avgboardings  634391 non-null  object 
 9   avgalight     634391 non-null  object 
 10  routes        630104 non-null  object 
dtypes: float64(1), int64(1), object(9)
memory usage: 53.2+ MB


In [5]:
uta_mode_level_boardings_weekday_averages_df.head()

Unnamed: 0,objectid,objectid_1,uta_mode,month_,wkd2017,wkd2018,wkd2019,wkd2020,wkd2021,wkd2022,wkd2023,wkd2024,wkd2025
0,9,9,Systemwide Ridership,September,171252.0,177946.0,174634.0,70866.0,101143.0,124206,138883,,
1,10,10,Systemwide Ridership,October,160425.0,162238.0,164327.0,67425.0,93835.0,117089,132174,,
2,11,11,Systemwide Ridership,November,163537.0,163148.0,159709.0,63213.0,95564.0,111165,128066,,
3,12,12,Systemwide Ridership,December,147735.0,142484.0,142114.0,60100.0,87840.0,97366,115264,,
4,21,21,Commuter Rail,September,19155.0,21800.0,21428.0,5383.0,10262.0,14208,15832,,


In [6]:
uta_mode_level_boardings_weekday_averages_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries, 0 to 179
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   objectid    180 non-null    int64  
 1   objectid_1  180 non-null    int64  
 2   uta_mode    180 non-null    object 
 3   month_      180 non-null    object 
 4   wkd2017     156 non-null    float64
 5   wkd2018     161 non-null    float64
 6   wkd2019     168 non-null    float64
 7   wkd2020     168 non-null    float64
 8   wkd2021     173 non-null    float64
 9   wkd2022     180 non-null    int64  
 10  wkd2023     180 non-null    int64  
 11  wkd2024     120 non-null    float64
 12  wkd2025     0 non-null      float64
dtypes: float64(7), int64(4), object(2)
memory usage: 18.4+ KB


In [7]:
uta_route_level_ridership_monthly_counts_df.head()

Unnamed: 0,objectid,mode,lineabbr,month_,year_,servicetype,avgboardings,city,county
0,1,Fixed Route Bus - Regular,1,February,2024,SAT,1190.376667,Salt Lake City,Salt Lake
1,2,Fixed Route Bus - Regular,1,January,2024,SUN,532.083333,Salt Lake City,Salt Lake
2,3,Fixed Route Bus - Regular,1,February,2024,WKD,2677.4719,Salt Lake City,Salt Lake
3,4,Fixed Route Bus - Regular,1,February,2024,SUN,622.216667,Salt Lake City,Salt Lake
4,5,Fixed Route Bus - Regular,1,January,2024,SAT,1031.4,Salt Lake City,Salt Lake


In [8]:
uta_route_level_ridership_monthly_counts_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18234 entries, 0 to 18233
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   objectid      18234 non-null  int64  
 1   mode          18234 non-null  object 
 2   lineabbr      18234 non-null  object 
 3   month_        18234 non-null  object 
 4   year_         18234 non-null  int64  
 5   servicetype   18234 non-null  object 
 6   avgboardings  18234 non-null  float64
 7   city          18234 non-null  object 
 8   county        18228 non-null  object 
dtypes: float64(1), int64(2), object(6)
memory usage: 1.3+ MB


In [9]:
uta_routes_and_most_recent_ridership_df.head()

Unnamed: 0,fid,lineabbr,linename,frequency,routetype,city,avgbrd,county,lineabbr1,shape__len,st_length_shape_
0,1,1,SOUTH TEMPLE,15,Frequent,Salt Lake City,2665.0,Salt Lake,1,0.216543,
1,11,47,4700 SOUTH,30,Regular,"Taylorsville, Murray, West Valley City",1468.0,Salt Lake,47,0.289048,
2,3,4,400 SOUTH,30,Regular,"Salt Lake City, Millcreek",1127.0,Salt Lake,4,0.289614,
3,4,9,900 SOUTH,15,Frequent,Salt Lake City,1992.0,Salt Lake,9,0.208646,
4,72,F202,SANDY PARKWAY FLEX,30,Regular,"Midvale, Sandy, Murray, South Jordan",193.0,Salt Lake,0,0.102216,


In [10]:
uta_routes_and_most_recent_ridership_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 81 entries, 0 to 80
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   fid               81 non-null     int64  
 1   lineabbr          81 non-null     object 
 2   linename          81 non-null     object 
 3   frequency         81 non-null     object 
 4   routetype         81 non-null     object 
 5   city              81 non-null     object 
 6   avgbrd            80 non-null     float64
 7   county            81 non-null     object 
 8   lineabbr1         81 non-null     int64  
 9   shape__len        78 non-null     float64
 10  st_length_shape_  0 non-null      float64
dtypes: float64(3), int64(2), object(6)
memory usage: 7.1+ KB
