In [2]:
%%html
<style>
table {float:left}
</style>

In [11]:
import os, pandas as pd, inflection

# 03 Compare rawnav data to the field data

This notebook provides a comparison between the rawnav data and the field data. The aim of this notebook is to validate the information in rawnav data and the processed data obtained from the `wamatarawnav` package. `wamatarawnav`package is developed for the WMATA Analysis of Fine-Grained Bus AVL to Evaluate Queue Jump Effectiveness study (Queue Jump Effectiveness study). Refer to *Introduction to parsing rawnav data with `wmatarawnav`* and *02 - Relating rawnav data to other data sources with `wmatarawnav`* for a basic overview of the `wmatarawnav` package. 

The field data was collected for the following queue jump locations on $7^{th}$, $8^{th}$, $9^{th}$, $14^{th}$, $15^{th}$, and $16^{th}$ July during the morning and evening peak hours (morning peak for some days and locations and evening peak for the others):
* 16th and U
* Georgia & Columbia
* Georgia & Irving
* Georgia & Piney Branch
* 11th & NY
* Irving & 16th

Rawnav data was obtained and processed for the dates on which field observation was carries out. We carried out high-level comparisons such as verifying the Bus IDs of the buses observed at the queue jump location match the Bus IDs observed in the rawnav data or the time when the rear door opens in the field matches the time from rawnav data. We also compared the processed data obtained from the `wmatarawnav` package on measures like "Dwell Time" with the field observed value. In addition to these, we categorized the bus trajectories observed in the field based on whether the bus stopped at the bus-stop and if the bus stopped at the signal downstream of the bus-stop. Note: all the queue jump location in this study are near-side thus the queue jump bus-stop is immediately followed by an intersection.

The following sections discusses the field data collection details, rawnav data processing, and comparison between the two.



<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Field-data" data-toc-modified-id="Field-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Field data</a></span></li><li><span><a href="#Rawnav-data" data-toc-modified-id="Rawnav-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Rawnav data</a></span></li><li><span><a href="#Comparison-between-rawnav-and-field-data" data-toc-modified-id="Comparison-between-rawnav-and-field-data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Comparison between rawnav and field data</a></span><ul class="toc-item"><li><span><a href="#High-level-comparison" data-toc-modified-id="High-level-comparison-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>High-level comparison</a></span></li><li><span><a href="#Summary-measure-comparison" data-toc-modified-id="Summary-measure-comparison-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Summary measure comparison</a></span></li><li><span><a href="#Trajectory-variation-when-bus-stops,-bus-doesn't-stop,-bus-uses-queue-jump" data-toc-modified-id="Trajectory-variation-when-bus-stops,-bus-doesn't-stop,-bus-uses-queue-jump-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Trajectory variation when bus stops, bus doesn't stop, bus uses queue jump</a></span></li></ul></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Conclusion</a></span></li></ul></div>

## Field data

Following table shows the data collection schedule for different queue jump locations. 

|Queue Jump Location   |Date (Day of week)                |Time of Day  |Bus Routes                       |Direction
|:--------------------:|:--------------------------------:|:-----------:|:-------------------------------:|:-------:|
|16th & U              |7/7/2020 (Tue) and 7/14/2020 (Tue)|4 PM to 6 PM |S1, S2, S4, and S9            |Northbound
|Georgia & Columbia    |7/7/2020 (Tue) and 7/14/2020 (Tue)|7 AM to 9 AM |70 and 79                        |Southbound
|Georgia & Irving      |7/8/2020 (Wed) and 7/15/2020 (Wed)|7 AM to 9 AM |70                               |Southbound
|Georgia & Piney Branch|7/9/2020 (Thu) and 7/16/2020 (Thu)|7 AM to 9 AM |70 and 79                        |Southbound
|11th & NY             |7/8/2020 (Wed) and 7/15/2020 (Wed)|4 PM to 6 PM |64 and G8                        |Northbound
|Irving & 16th         |7/9/2020 (Thu) and 7/16/2020 (Thu)|4 PM to 6 PM |D32, H1, H2, H3,<br>H4, H8, and W47 |Eastbound


In [None]:
import numpy as np
def time_to_sec(time_mess):
    hr_min_sec = str(time_mess).split(":")
    if len(hr_min_sec) == 3:
        tot_sec = int(hr_min_sec[0])*3600+int(hr_min_sec[1])*60+int(hr_min_sec[2])
        return(tot_sec)
    if len(hr_min_sec)==2:
        if hr_min_sec[0].strip() == '':
            tot_sec = hr_min_sec[1]
            return(int(tot_sec))
    elif len(hr_min_sec) == 1:
        tot_sec = hr_min_sec[0]
        if tot_sec.isnumeric():
            return(np.int(tot_sec))
        else: return(np.nan)
    else: ""
    raise("Can't handle argument")
    
time_to_sec(1)
time_to_sec("0:2:30")


In [117]:
def correct_data_types(dat):
    dat = dat.assign(metrobus_route=lambda x: x.metrobus_route.astype("str"),
                              bus_id =lambda x: x.bus_id.astype("Int32"),
                              data_field_obs = lambda x: x.date_field_obs.astype("str"),
                              signal_phase = lambda x: x.signal_phase.astype('str'),
                              time_entered_stop_zone = lambda x: pd.to_datetime(x.data_field_obs+" "+x.time_entered_stop_zone.astype(str),format="%Y-%m-%d %H:%M:%S",errors='coerce'),
                              time_left_stop_zone = lambda x: pd.to_datetime(x.data_field_obs+" "+x.time_left_stop_zone.astype(str),format="%Y-%m-%d %H:%M:%S",errors='coerce'),                              
                              front_door_open_time = lambda x: pd.to_datetime(x.data_field_obs+" "+x.front_door_open_time.astype(str),format="%Y-%m-%d %H:%M:%S",errors='coerce'),                              
                              front_door_close_time = lambda x: pd.to_datetime(x.data_field_obs+" "+x.front_door_close_time.astype(str),format="%Y-%m-%d %H:%M:%S",errors='coerce'),                              
                              rear_door_open_time = lambda x: pd.to_datetime(x.data_field_obs+" "+x.rear_door_open_time.astype(str),format="%Y-%m-%d %H:%M:%S",errors='coerce'),                              
                              rear_door_close_time = lambda x: pd.to_datetime(x.data_field_obs+" "+x.rear_door_close_time.astype(str),format="%Y-%m-%d %H:%M:%S",errors='coerce'),
                              dwell_time =lambda x: pd.to_timedelta(x.dwell_time.apply(time_to_sec),unit='s'),
                              number_of_boarding = lambda x: x.number_of_boarding.astype('Int32'),
                              number_of_alightings = lambda x: x.number_of_alightings.astype('Int32'),
                              total_time_at_intersection =lambda x: pd.to_timedelta(x.total_time_at_intersection.apply(time_to_sec),unit='s'),
                              traffic_conditions = lambda x: x.traffic_conditions.astype('str'),
                              notes = lambda x: x.notes.astype('str')
                             )  
    return dat

In [119]:
path_field_dir = r'C:\Users\abibeka\OneDrive - Kittelson & Associates, Inc\Documents\WMATA-AVL\Data\july15_field_data'
path_field_file = os.path.join(path_field_dir,'WMATA AVL field obsv spreadsheet v.2.xlsx')
field_xlsx_wb = pd.ExcelFile(path_field_file)
field_dict = {}

col_keep = ['Metrobus Route','Bus ID',"Today's Date",'Signal Phase','Time Entered Stop Zone','Time Left Stop Zone',
            'Front Door Open Time','Front Door Close Time','Rear Door Open Time','Rear Door Close Time','Dwell Time',
           'Number of boardings','Number of alightings','Total Time at Intersection','Traffic Conditions','Notes']

col_new_names = ['metrobus_route','bus_id','date_field_obs','signal_phase','time_entered_stop_zone','time_left_stop_zone',
                 'front_door_open_time','front_door_close_time','rear_door_open_time','rear_door_close_time','dwell_time',
                'number_of_boarding','number_of_alightings','total_time_at_intersection','traffic_conditions','notes']
field_col_rename_map = {key:value for (key,value) in zip(col_keep,col_new_names)}
    
for sheet in field_xlsx_wb.sheet_names:
    dat = field_xlsx_wb.parse(sheet)
    dat = dat.loc[:,col_keep].rename(columns=field_col_rename_map)
    dat = correct_data_types(dat)
    field_dict[sheet] = dat

In [121]:
field_dict.keys()

dict_keys(['16th & U', 'Georgia & Columbia', 'Georgia & Irving', 'Georgia & Piney Branch', '11th & NY', 'Irving & 16th'])

In [122]:
field_dict['Georgia & Columbia']

Unnamed: 0,metrobus_route,bus_id,date_field_obs,signal_phase,time_entered_stop_zone,time_left_stop_zone,front_door_open_time,front_door_close_time,rear_door_open_time,rear_door_close_time,dwell_time,number_of_boarding,number_of_alightings,total_time_at_intersection,traffic_conditions,notes,data_field_obs
0,70,7270,2020-07-07,Green,2020-07-07 07:01:57,2020-07-07 07:02:00,NaT,NaT,NaT,NaT,00:00:00,0,0,00:00:03,Free-Flowing,Did not stop,2020-07-07
1,79,6040,2020-07-07,Green,2020-07-07 07:13:41,2020-07-07 07:14:15,2020-07-07 07:13:49,2020-07-07 07:14:10,2020-07-07 07:13:49,2020-07-07 07:14:10,00:00:21,4,3,00:00:44,Free-Flowing,,2020-07-07
2,70,7350,2020-07-07,Green,2020-07-07 07:24:36,2020-07-07 07:25:31,NaT,NaT,2020-07-07 07:24:44,2020-07-07 07:24:58,00:00:14,6,0,00:00:55,Free-Flowing,,2020-07-07
3,79,6436,2020-07-07,Green,2020-07-07 07:28:35,2020-07-07 07:29:33,NaT,NaT,2020-07-07 07:28:41,2020-07-07 07:28:54,00:00:13,1,1,00:00:58,Free-Flowing,,2020-07-07
4,70,7377,2020-07-07,Green,2020-07-07 07:35:35,2020-07-07 07:35:59,NaT,NaT,2020-07-07 07:35:42,2020-07-07 07:35:54,00:00:12,1,3,00:00:24,Moderate,,2020-07-07
5,79,6460,2020-07-07,Green,2020-07-07 07:43:41,2020-07-07 07:44:08,NaT,NaT,2020-07-07 07:43:49,2020-07-07 07:44:03,00:00:14,1,3,00:00:27,Moderate,,2020-07-07
6,70,7354,2020-07-07,Green,2020-07-07 07:54:09,2020-07-07 07:54:13,NaT,NaT,NaT,NaT,00:00:00,0,0,00:00:04,Free-Flowing,Did not stop,2020-07-07
7,79,6427,2020-07-07,Green,2020-07-07 07:58:03,2020-07-07 07:58:30,NaT,NaT,2020-07-07 07:58:12,2020-07-07 07:58:23,00:00:11,1,2,00:00:27,Free-Flowing,,2020-07-07
8,70,7367,2020-07-07,Green,2020-07-07 08:05:32,2020-07-07 08:05:52,NaT,NaT,2020-07-07 08:05:41,2020-07-07 08:05:46,00:00:05,1,0,00:00:20,Moderate,,2020-07-07
9,79,6456,2020-07-07,Green,2020-07-07 08:11:43,2020-07-07 08:12:12,NaT,NaT,2020-07-07 08:11:50,2020-07-07 08:12:05,00:00:15,1,6,00:00:29,Moderate,,2020-07-07


## Rawnav data

## Comparison between rawnav and field data


### High-level comparison


### Summary measure comparison


### Trajectory variation when bus stops, bus doesn't stop, bus uses queue jump

## Conclusion