I use FoCS environment. Python 3.11.6 version.

In [43]:
# Import useful libraries
import os
import json
import re
import pandas as pd
import numpy as np
from collections import defaultdict

In [23]:
# Configuration file
with open('config.json', 'r') as f:
    config = json.load(f)

You have to work on the [ZTBus: A Large Dataset of Time-Resolved City Bus Driving Missions](https://www.research-collection.ethz.ch/handle/20.500.11850/626723) repository.

It contains:
*  [metaData.csv](https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/626723/metaData.csv?sequence=1&isAllowed=y), shortly *trips*
*  several other files containing detailed data on some bus parameters, whose name is in the *trips* file. Those files can be downloaded as a [zip file](https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/626723/ZTBus_compressed.zip?sequence=3&isAllowed=y). Let us call those datasets the *details* datasets.

# Import

**ZT_bus Folder Structure**: The 'ZT_bus' folder is the main directory containing data related to driving missions.
Inside this folder, you'll find information on 1409 driving missions.

**Metadata Information**: Additionally, there is a 'metadata' folder that stores metadata related to these missions.
The metadata includes information for all 1490 missions.


## Metadata csv

In [44]:
metadata = pd.read_csv(config['path_metadata'])

In [45]:
metadata.head()

Unnamed: 0,name,busNumber,startTime_iso,startTime_unix,endTime_iso,endTime_unix,drivenDistance,busRoute,energyConsumption,itcs_numberOfPassengers_mean,itcs_numberOfPassengers_min,itcs_numberOfPassengers_max,status_gridIsAvailable_mean,temperature_ambient_mean,temperature_ambient_min,temperature_ambient_max
0,B183_2019-04-30_03-18-56_2019-04-30_08-44-20,183,2019-04-30T03:18:56Z,1556594336,2019-04-30T08:44:20Z,1556613860,77213.87,-,478585200.0,5.53886,0,20,0.74064,282.378,281.15,293.15
1,B183_2019-04-30_13-22-07_2019-04-30_17-54-02,183,2019-04-30T13:22:07Z,1556630527,2019-04-30T17:54:02Z,1556646842,59029.6,31,402258500.0,33.11458,4,74,0.855234,287.5443,285.15,293.15
2,B183_2019-05-01_05-58-51_2019-05-01_22-32-30,183,2019-05-01T05:58:51Z,1556690331,2019-05-01T22:32:30Z,1556749950,240900.4,33,1445733000.0,19.68914,0,55,0.77786,288.749,280.15,294.15
3,B183_2019-05-03_02-50-21_2019-05-03_05-53-20,183,2019-05-03T02:50:21Z,1556851821,2019-05-03T05:53:20Z,1556862800,42565.48,-,281986700.0,1.685185,0,8,0.767122,282.4129,281.15,292.15
4,B183_2019-05-03_15-41-57_2019-05-03_23-06-24,183,2019-05-03T15:41:57Z,1556898117,2019-05-03T23:06:24Z,1556924784,125277.2,72,620725800.0,23.75357,1,67,0.907342,284.7325,282.15,287.15


In [5]:
metadata.shape

(1409, 16)

In [26]:
# Check metadata dtypes
metadata.dtypes

name                             object
busNumber                         int64
startTime_iso                    object
startTime_unix                    int64
endTime_iso                      object
endTime_unix                      int64
drivenDistance                  float64
busRoute                         object
energyConsumption               float64
itcs_numberOfPassengers_mean    float64
itcs_numberOfPassengers_min       int64
itcs_numberOfPassengers_max       int64
status_gridIsAvailable_mean     float64
temperature_ambient_mean        float64
temperature_ambient_min         float64
temperature_ambient_max         float64
dtype: object

`startTime_iso` has `object` format. It should be a `datetime` object. Later, when this variable will be used for analysis it will be better to change the format or to use `startTime_unix`. The same regards the `endTime_iso` variable. The other format are correct for the nature of the variable they represent. 

TODO: Check if other data types should be different.

### Nan values check

In [29]:
metadata.isna().sum()

name                            0
busNumber                       0
startTime_iso                   0
startTime_unix                  0
endTime_iso                     0
endTime_unix                    0
drivenDistance                  0
busRoute                        0
energyConsumption               0
itcs_numberOfPassengers_mean    0
itcs_numberOfPassengers_min     0
itcs_numberOfPassengers_max     0
status_gridIsAvailable_mean     0
temperature_ambient_mean        0
temperature_ambient_min         0
temperature_ambient_max         0
dtype: int64

In [31]:
print(f'Metadata dataframes has: {metadata.isna().sum().sum()} null values.')

Metadata dataframes has: 0 null values.


Checking the output of before, the variable `busRoute` should have nan values because from `metadata.head()` I see that some cells have `-` corresponding to a missing value. If this variable will be used for analysis I have to take in consideration this aspect. 

## Import ZT_bus data

In [27]:
file_names = [file_name for file_name in os.listdir(config['path_ZTbus_folder']) if 
              os.path.isfile(os.path.join(config['path_ZTbus_folder'], file_name))]

In [8]:
file_names

['B183_2020-11-13_14-52-45_2020-11-13_19-13-45.csv',
 'B183_2020-10-06_04-23-44_2020-10-06_07-33-54.csv',
 'B183_2019-10-03_03-04-42_2019-10-03_18-38-45.csv',
 'B183_2022-05-02_03-02-19_2022-05-02_17-07-49.csv',
 'B183_2021-04-23_03-47-54_2021-04-23_07-48-48.csv',
 'B208_2022-08-15_03-31-51_2022-08-15_12-34-10.csv',
 'B183_2020-07-24_04-01-31_2020-07-24_18-04-39.csv',
 'B183_2022-10-28_13-36-23_2022-10-28_16-37-08.csv',
 'B208_2021-04-21_04-10-07_2021-04-21_18-19-32.csv',
 'B183_2022-07-28_14-27-33_2022-07-28_19-17-23.csv',
 'B183_2021-09-14_04-09-57_2021-09-14_11-46-59.csv',
 'B208_2022-04-01_22-29-37_2022-04-02_02-35-00.csv',
 'B208_2021-10-21_12-57-23_2021-10-21_18-26-47.csv',
 'B208_2022-01-13_05-08-12_2022-01-13_19-18-40.csv',
 'B183_2021-12-20_04-59-40_2021-12-20_08-17-47.csv',
 'B183_2022-06-24_23-09-54_2022-06-25_02-17-26.csv',
 'B208_2021-07-29_04-10-00_2021-07-29_17-55-34.csv',
 'B183_2019-07-29_03-07-49_2019-07-29_17-48-21.csv',
 'B208_2022-12-09_23-55-12_2022-12-10_03-24-28

In [10]:
len(file_names) == metadata.shape[0]

True

As expected, ZT_bus folder contains data related to 1409 driving missions and metadata contains information related to that missions (always 1490).

In [28]:
dataframes = {}
for el in file_names:
    dataframes[el[:-4]] = pd.read_csv(f'{config["path_ZTbus_folder"]}/{el}')

In [12]:
type(dataframes)

dict

`dataframes` is a dictionary of dataframes, in total 1490. Each of them correspond to a driving mission.

In [13]:
dataframes.keys()

dict_keys(['B183_2020-11-13_14-52-45_2020-11-13_19-13-45', 'B183_2020-10-06_04-23-44_2020-10-06_07-33-54', 'B183_2019-10-03_03-04-42_2019-10-03_18-38-45', 'B183_2022-05-02_03-02-19_2022-05-02_17-07-49', 'B183_2021-04-23_03-47-54_2021-04-23_07-48-48', 'B208_2022-08-15_03-31-51_2022-08-15_12-34-10', 'B183_2020-07-24_04-01-31_2020-07-24_18-04-39', 'B183_2022-10-28_13-36-23_2022-10-28_16-37-08', 'B208_2021-04-21_04-10-07_2021-04-21_18-19-32', 'B183_2022-07-28_14-27-33_2022-07-28_19-17-23', 'B183_2021-09-14_04-09-57_2021-09-14_11-46-59', 'B208_2022-04-01_22-29-37_2022-04-02_02-35-00', 'B208_2021-10-21_12-57-23_2021-10-21_18-26-47', 'B208_2022-01-13_05-08-12_2022-01-13_19-18-40', 'B183_2021-12-20_04-59-40_2021-12-20_08-17-47', 'B183_2022-06-24_23-09-54_2022-06-25_02-17-26', 'B208_2021-07-29_04-10-00_2021-07-29_17-55-34', 'B183_2019-07-29_03-07-49_2019-07-29_17-48-21', 'B208_2022-12-09_23-55-12_2022-12-10_03-24-28', 'B208_2022-03-25_23-51-19_2022-03-26_03-42-34', 'B208_2021-06-07_03-54-28_202

In [15]:
len(dataframes.keys())

1409

Let's take two samples dataframes to analyse.

In [32]:
dataframes['B183_2019-04-30_03-18-56_2019-04-30_08-44-20'].head()

Unnamed: 0,time_iso,time_unix,electric_powerDemand,gnss_altitude,gnss_course,gnss_latitude,gnss_longitude,itcs_busRoute,itcs_numberOfPassengers,itcs_stopName,...,odometry_wheelSpeed_mr,odometry_wheelSpeed_rl,odometry_wheelSpeed_rr,status_doorIsOpen,status_gridIsAvailable,status_haltBrakeIsActive,status_parkBrakeIsActive,temperature_ambient,traction_brakePressure,traction_tractionForce
0,2019-04-30T03:18:56Z,1556594336,-13.84551,,,,,-,,-,...,0.0,0.0,0.0,1,1,0,0,293.15,251666.7,0.0
1,2019-04-30T03:18:57Z,1556594337,-3.849362,,,,,-,,-,...,0.0,0.0,0.0,1,1,0,0,292.3688,254876.2,0.0
2,2019-04-30T03:18:58Z,1556594338,-0.672331,,,,,-,,-,...,0.0,0.0,0.0,1,1,0,0,292.931,251783.3,0.0
3,2019-04-30T03:18:59Z,1556594339,-1.087931,,,,,-,,-,...,0.0,0.0,0.0,1,1,0,0,293.15,255000.0,0.0
4,2019-04-30T03:19:00Z,1556594340,-0.811985,,,,,-,,-,...,0.0,0.0,0.0,1,1,0,0,293.15,253000.0,0.0


In [18]:
dataframes['B183_2019-04-30_03-18-56_2019-04-30_08-44-20'].columns

Index(['time_iso', 'time_unix', 'electric_powerDemand', 'gnss_altitude',
       'gnss_course', 'gnss_latitude', 'gnss_longitude', 'itcs_busRoute',
       'itcs_numberOfPassengers', 'itcs_stopName',
       'odometry_articulationAngle', 'odometry_steeringAngle',
       'odometry_vehicleSpeed', 'odometry_wheelSpeed_fl',
       'odometry_wheelSpeed_fr', 'odometry_wheelSpeed_ml',
       'odometry_wheelSpeed_mr', 'odometry_wheelSpeed_rl',
       'odometry_wheelSpeed_rr', 'status_doorIsOpen', 'status_gridIsAvailable',
       'status_haltBrakeIsActive', 'status_parkBrakeIsActive',
       'temperature_ambient', 'traction_brakePressure',
       'traction_tractionForce'],
      dtype='object')

In [19]:
dataframes['B183_2019-04-30_03-18-56_2019-04-30_08-44-20'].shape

(19525, 26)

In [23]:
dataframes['B183_2019-04-30_03-18-56_2019-04-30_08-44-20'].dtypes

time_iso                       object
time_unix                       int64
electric_powerDemand          float64
gnss_altitude                 float64
gnss_course                   float64
gnss_latitude                 float64
gnss_longitude                float64
itcs_busRoute                  object
itcs_numberOfPassengers       float64
itcs_stopName                  object
odometry_articulationAngle    float64
odometry_steeringAngle        float64
odometry_vehicleSpeed         float64
odometry_wheelSpeed_fl        float64
odometry_wheelSpeed_fr        float64
odometry_wheelSpeed_ml        float64
odometry_wheelSpeed_mr        float64
odometry_wheelSpeed_rl        float64
odometry_wheelSpeed_rr        float64
status_doorIsOpen               int64
status_gridIsAvailable          int64
status_haltBrakeIsActive        int64
status_parkBrakeIsActive        int64
temperature_ambient           float64
traction_brakePressure        float64
traction_tractionForce        float64
dtype: objec

In [21]:
dataframes['B208_2022-03-25_23-51-19_2022-03-26_03-42-34'].head()

Unnamed: 0,time_iso,time_unix,electric_powerDemand,gnss_altitude,gnss_course,gnss_latitude,gnss_longitude,itcs_busRoute,itcs_numberOfPassengers,itcs_stopName,...,odometry_wheelSpeed_mr,odometry_wheelSpeed_rl,odometry_wheelSpeed_rr,status_doorIsOpen,status_gridIsAvailable,status_haltBrakeIsActive,status_parkBrakeIsActive,temperature_ambient,traction_brakePressure,traction_tractionForce
0,2022-03-25T23:51:19Z,1648252279,2795.944,,,,,-,,-,...,0.0,0.0,0.0,1,0,1,1,293.839,245833.3,0.0
1,2022-03-25T23:51:20Z,1648252280,2717.339,,,,,-,,-,...,0.0,0.0,0.0,1,0,1,1,293.461,245833.3,0.0
2,2022-03-25T23:51:21Z,1648252281,2904.655,,,,,-,,-,...,0.0,0.0,0.0,1,0,1,1,293.15,245833.3,0.0
3,2022-03-25T23:51:22Z,1648252282,2862.673,,,,,-,,-,...,0.0,0.0,0.0,1,0,1,1,293.839,245833.3,0.0
4,2022-03-25T23:51:23Z,1648252283,2927.541,,,,,-,,-,...,0.0,0.0,0.0,1,0,1,1,293.461,245833.3,0.0


In [22]:
dataframes['B208_2022-03-25_23-51-19_2022-03-26_03-42-34'].shape

(13876, 26)

In [24]:
dataframes['B208_2022-03-25_23-51-19_2022-03-26_03-42-34'].dtypes

time_iso                       object
time_unix                       int64
electric_powerDemand          float64
gnss_altitude                 float64
gnss_course                   float64
gnss_latitude                 float64
gnss_longitude                float64
itcs_busRoute                  object
itcs_numberOfPassengers       float64
itcs_stopName                  object
odometry_articulationAngle    float64
odometry_steeringAngle        float64
odometry_vehicleSpeed         float64
odometry_wheelSpeed_fl        float64
odometry_wheelSpeed_fr        float64
odometry_wheelSpeed_ml        float64
odometry_wheelSpeed_mr        float64
odometry_wheelSpeed_rl        float64
odometry_wheelSpeed_rr        float64
status_doorIsOpen               int64
status_gridIsAvailable          int64
status_haltBrakeIsActive        int64
status_parkBrakeIsActive        int64
temperature_ambient           float64
traction_brakePressure        float64
traction_tractionForce        float64
dtype: objec

As explained before, `time_iso` should be `datetime` format but instead is `object`. For future analysis it will be better to change the format or to use `time_unix`. The other variables formats are correct for the nature of the variable they represent. I assume this adapts also to the other dataframes. 

In [19]:
diz = defaultdict(int)
# defaultdict(int) initializes the dictionary values to 0

for name in file_names:
    prefix = name[:4]
    diz[prefix] += 1

diz

defaultdict(int, {'B183': 864, 'B208': 545})

* 864 driving missions with bus 183
* 545 driving missions with bus 208

### Nan values check

In [34]:
dataframes['B183_2019-04-30_03-18-56_2019-04-30_08-44-20'].isna().sum()

time_iso                          0
time_unix                         0
electric_powerDemand              0
gnss_altitude                 19328
gnss_course                   19328
gnss_latitude                 19328
gnss_longitude                19328
itcs_busRoute                     0
itcs_numberOfPassengers       19332
itcs_stopName                     0
odometry_articulationAngle        0
odometry_steeringAngle            0
odometry_vehicleSpeed             0
odometry_wheelSpeed_fl            0
odometry_wheelSpeed_fr            0
odometry_wheelSpeed_ml            0
odometry_wheelSpeed_mr            0
odometry_wheelSpeed_rl            0
odometry_wheelSpeed_rr            0
status_doorIsOpen                 0
status_gridIsAvailable            0
status_haltBrakeIsActive          0
status_parkBrakeIsActive          0
temperature_ambient               0
traction_brakePressure            0
traction_tractionForce            0
dtype: int64

In [33]:
dataframes['B208_2022-03-25_23-51-19_2022-03-26_03-42-34'].isna().sum()

time_iso                          0
time_unix                         0
electric_powerDemand              0
gnss_altitude                   115
gnss_course                     110
gnss_latitude                   110
gnss_longitude                  110
itcs_busRoute                     0
itcs_numberOfPassengers       13781
itcs_stopName                     0
odometry_articulationAngle        0
odometry_steeringAngle            0
odometry_vehicleSpeed             0
odometry_wheelSpeed_fl            0
odometry_wheelSpeed_fr            0
odometry_wheelSpeed_ml            0
odometry_wheelSpeed_mr            0
odometry_wheelSpeed_rl            0
odometry_wheelSpeed_rr            0
status_doorIsOpen                 0
status_gridIsAvailable            0
status_haltBrakeIsActive          0
status_parkBrakeIsActive          0
temperature_ambient               0
traction_brakePressure            0
traction_tractionForce            0
dtype: int64

In both datasets, it appears that the columns `gnss_altitude`, `gnss_course`, `gnss_latitude`, and `itcs_numberOfPassengers` have some missing values. The number of missing values varies between the two datasets. It might be worth investigating why these specific columns have missing data and how it could impact your analysis or modeling. 

Based on this two sample I think that there are missing values for the two columns described above. In future analysis it's important to take in consideration this aspect but also to consider that possible other variables can have Null values because for now I only checked two samples.


TODO: 
* Describe a bit the data

# Project

## 1. Extract all trips with `busRoute` 83

As described before the variable `busRoute` should have nan values because from `metadata.head()` I see that some cells have `-`. In order to extract the trips with busRoute 83 I decide to transform substitute the values `-` with a null value because the data for that cell is missing.

In [35]:
metadata.head()

Unnamed: 0,name,busNumber,startTime_iso,startTime_unix,endTime_iso,endTime_unix,drivenDistance,busRoute,energyConsumption,itcs_numberOfPassengers_mean,itcs_numberOfPassengers_min,itcs_numberOfPassengers_max,status_gridIsAvailable_mean,temperature_ambient_mean,temperature_ambient_min,temperature_ambient_max
0,B183_2019-04-30_03-18-56_2019-04-30_08-44-20,183,2019-04-30T03:18:56Z,1556594336,2019-04-30T08:44:20Z,1556613860,77213.87,-,478585200.0,5.53886,0,20,0.74064,282.378,281.15,293.15
1,B183_2019-04-30_13-22-07_2019-04-30_17-54-02,183,2019-04-30T13:22:07Z,1556630527,2019-04-30T17:54:02Z,1556646842,59029.6,31,402258500.0,33.11458,4,74,0.855234,287.5443,285.15,293.15
2,B183_2019-05-01_05-58-51_2019-05-01_22-32-30,183,2019-05-01T05:58:51Z,1556690331,2019-05-01T22:32:30Z,1556749950,240900.4,33,1445733000.0,19.68914,0,55,0.77786,288.749,280.15,294.15
3,B183_2019-05-03_02-50-21_2019-05-03_05-53-20,183,2019-05-03T02:50:21Z,1556851821,2019-05-03T05:53:20Z,1556862800,42565.48,-,281986700.0,1.685185,0,8,0.767122,282.4129,281.15,292.15
4,B183_2019-05-03_15-41-57_2019-05-03_23-06-24,183,2019-05-03T15:41:57Z,1556898117,2019-05-03T23:06:24Z,1556924784,125277.2,72,620725800.0,23.75357,1,67,0.907342,284.7325,282.15,287.15


In [39]:
metadata.dtypes

name                             object
busNumber                         int64
startTime_iso                    object
startTime_unix                    int64
endTime_iso                      object
endTime_unix                      int64
drivenDistance                  float64
busRoute                         object
energyConsumption               float64
itcs_numberOfPassengers_mean    float64
itcs_numberOfPassengers_min       int64
itcs_numberOfPassengers_max       int64
status_gridIsAvailable_mean     float64
temperature_ambient_mean        float64
temperature_ambient_min         float64
temperature_ambient_max         float64
dtype: object

In [46]:
metadata['busRoute'] = metadata['busRoute'].replace('-', np.nan)

In [47]:
metadata.head()

Unnamed: 0,name,busNumber,startTime_iso,startTime_unix,endTime_iso,endTime_unix,drivenDistance,busRoute,energyConsumption,itcs_numberOfPassengers_mean,itcs_numberOfPassengers_min,itcs_numberOfPassengers_max,status_gridIsAvailable_mean,temperature_ambient_mean,temperature_ambient_min,temperature_ambient_max
0,B183_2019-04-30_03-18-56_2019-04-30_08-44-20,183,2019-04-30T03:18:56Z,1556594336,2019-04-30T08:44:20Z,1556613860,77213.87,,478585200.0,5.53886,0,20,0.74064,282.378,281.15,293.15
1,B183_2019-04-30_13-22-07_2019-04-30_17-54-02,183,2019-04-30T13:22:07Z,1556630527,2019-04-30T17:54:02Z,1556646842,59029.6,31.0,402258500.0,33.11458,4,74,0.855234,287.5443,285.15,293.15
2,B183_2019-05-01_05-58-51_2019-05-01_22-32-30,183,2019-05-01T05:58:51Z,1556690331,2019-05-01T22:32:30Z,1556749950,240900.4,33.0,1445733000.0,19.68914,0,55,0.77786,288.749,280.15,294.15
3,B183_2019-05-03_02-50-21_2019-05-03_05-53-20,183,2019-05-03T02:50:21Z,1556851821,2019-05-03T05:53:20Z,1556862800,42565.48,,281986700.0,1.685185,0,8,0.767122,282.4129,281.15,292.15
4,B183_2019-05-03_15-41-57_2019-05-03_23-06-24,183,2019-05-03T15:41:57Z,1556898117,2019-05-03T23:06:24Z,1556924784,125277.2,72.0,620725800.0,23.75357,1,67,0.907342,284.7325,282.15,287.15


In [48]:
metadata.isna().sum()

name                             0
busNumber                        0
startTime_iso                    0
startTime_unix                   0
endTime_iso                      0
endTime_unix                     0
drivenDistance                   0
busRoute                        11
energyConsumption                0
itcs_numberOfPassengers_mean     0
itcs_numberOfPassengers_min      0
itcs_numberOfPassengers_max      0
status_gridIsAvailable_mean      0
temperature_ambient_mean         0
temperature_ambient_min          0
temperature_ambient_max          0
dtype: int64

In [50]:
print(f'busRoute variable contains: {metadata["busRoute"].isna().sum()} missing values')

busRoute variable contains: 11 missing values


In [51]:
metadata.dtypes

name                             object
busNumber                         int64
startTime_iso                    object
startTime_unix                    int64
endTime_iso                      object
endTime_unix                      int64
drivenDistance                  float64
busRoute                         object
energyConsumption               float64
itcs_numberOfPassengers_mean    float64
itcs_numberOfPassengers_min       int64
itcs_numberOfPassengers_max       int64
status_gridIsAvailable_mean     float64
temperature_ambient_mean        float64
temperature_ambient_min         float64
temperature_ambient_max         float64
dtype: object

Let's check the unique values of `busRoute`.

In [52]:
metadata['busRoute'].unique()

array([nan, '31', '33', '72', '46', '32', '83', 'N4', 'N2', 'N1'],
      dtype=object)

The bus routes are not only integers because there are also `N4`, `N2` and `N1`. Probably this trips correspond to routes done in the night. I can keep the data type as a `object` and the format to `string` and I can extract the trips with busRoute 83.

In [53]:
metadata[metadata['busRoute']=='83']

Unnamed: 0,name,busNumber,startTime_iso,startTime_unix,endTime_iso,endTime_unix,drivenDistance,busRoute,energyConsumption,itcs_numberOfPassengers_mean,itcs_numberOfPassengers_min,itcs_numberOfPassengers_max,status_gridIsAvailable_mean,temperature_ambient_mean,temperature_ambient_min,temperature_ambient_max
154,B183_2020-03-03_04-42-38_2020-03-03_19-44-51,183,2020-03-03T04:42:38Z,1583210558,2020-03-03T19:44:51Z,1583264691,225047.90,83,1.544278e+09,23.47531,0,118,0.472180,280.5450,279.15,289.1500
155,B183_2020-03-06_04-53-23_2020-03-06_19-44-42,183,2020-03-06T04:53:23Z,1583470403,2020-03-06T19:44:42Z,1583523882,224512.30,83,1.631816e+09,17.41578,0,69,0.451028,279.8850,278.15,289.1500
157,B183_2020-03-09_14-16-13_2020-03-09_19-34-17,183,2020-03-09T14:16:13Z,1583763373,2020-03-09T19:34:17Z,1583782457,77824.36,83,5.406013e+08,23.18182,0,74,0.460099,281.0489,279.15,291.1500
158,B183_2020-03-10_04-50-03_2020-03-10_19-51-25,183,2020-03-10T04:50:03Z,1583815803,2020-03-10T19:51:25Z,1583869885,225095.80,83,1.692171e+09,20.96410,0,86,0.475233,279.8363,279.15,291.1500
159,B183_2020-03-12_04-56-41_2020-03-12_19-44-57,183,2020-03-12T04:56:41Z,1583989001,2020-03-12T19:44:57Z,1584042297,224181.20,83,1.145860e+09,17.21235,0,80,0.340882,287.3445,282.15,291.1500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1399,B208_2022-11-30_04-47-53_2022-11-30_19-50-22,208,2022-11-30T04:47:53Z,1669783673,2022-11-30T19:50:22Z,1669837822,223165.00,83,1.560888e+09,27.89066,2,100,0.456196,280.6948,279.15,293.1500
1400,B208_2022-12-01_05-19-41_2022-12-01_18-20-57,208,2022-12-01T05:19:41Z,1669871981,2022-12-01T18:20:57Z,1669918857,190196.00,83,1.418847e+09,26.03927,0,96,0.450413,279.7655,279.15,292.1500
1401,B208_2022-12-02_04-47-48_2022-12-02_19-40-01,208,2022-12-02T04:47:48Z,1669956468,2022-12-02T19:40:01Z,1670010001,224473.40,83,1.611150e+09,24.80384,2,91,0.438693,279.7888,279.15,291.1500
1405,B208_2022-12-07_05-13-02_2022-12-07_19-19-53,208,2022-12-07T05:13:02Z,1670389982,2022-12-07T19:19:53Z,1670440793,210041.60,83,1.536697e+09,28.78539,0,115,0.434858,279.5283,278.15,292.6655


In [55]:
print(f'The number of trips with bus route 83 is: {metadata[metadata["busRoute"]=="83"].shape[0]}')

The number of trips with bus route 83 is: 846


### 2. Extract all trips where `busRoute` is not a number 

Not a number mean Nan values or night trip.