# Exploratory Data Analysis (EDA)

### Input:

- [LAD+incident.csv](data/LAD+incident.csv)
|ID|DateTime|Length|Volume|Speed|Occupancy|Incident_type|
|--|--------|------|------|-----|---------|--------|
|1-14|2018-01-01 to 2018-10-25|||||

### Data Mapping

Map Link name to ID for convenience sake

| Link ID | NPI Link Name |
|---------|---------------|
|1|Kwinana Fwy NB between Kwinana Fwy Nth Bnd H015 Nth Bnd - H018 East Bnd & Kwinana Fwy Nth Bnd H018 W|
|2|Kwinana Fwy NB between Kwinana Fwy Nth Bnd H018 West Bnd - H015 Nth Bnd & Farrington Rd On - H015 Nt|
|3|Kwinana Fwy NB between Farrington Rd On - H015 Nth Bo & H015 Nth Bound - South St Off|
|4|Kwinana Fwy NB between H015 Nth Bound - South St Off & South St On - H015 Nth Bound|
|5|Kwinana Fwy NB between South St On - H015 Nth Bound & H015 Nth Bound - Leach Hwy Off|
|6|Kwinana Fwy NB between H015 Nth Bound - Leach Hwy Off & Leach Hwy West Bound On - H015|
|7|Kwinana Fwy NB between Leach Hwy West Bound On - H015 & Leach Hwy East Bound On - H015|
|8|Kwinana Fwy NB between Leach Hwy East Bound On - H015 & Cranford Av On - H015 Nth Bou|
|9|Kwinana Fwy NB between Cranford Av On - H015 Nth Bou & H015 Sth Bound - H548|
|10|Kwinana Fwy NB between H015 Sth Bound - H548 & Manning Rd - H547 On Kwinana Fwy Nth Bound|
|11|Kwinana Fwy NWB between Manning Rd - H547 On Kwinana Fwy Nth Bound & Canning Hwy - H549 On Kwinana F|
|12|Kwinana Fwy NB between Kwinana Fwy (northbound) Bus Ln From Canning Hwy: H013 On To H015 Northbound|
|13|Kwinana Fwy NB between Mill Pt Rd - H500 On Kwinana Fwy Nth Bound & Kwinana Fwy Nth Bound H503 Off -|
|14|Kwinana Fwy NB between Kwinana Fwy Nth Bound H503 Off - Mill Pt Rd & Mitchell Fwy Nth Bound|

### Contents

1. [A](#1) <br>
    1.1. [AA](#1.1) <br>
    1.2. [AAA](#1.2) <br>
    1.3. [AAAA](#1.3) <br>
2. [B](#2) <br>
    2.1. [BB](#2.1) <br>

In [1]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error

# tensorflow
import tensorflow as tf
from tensorflow import keras

from keras import backend as K
from keras.models import Sequential, load_model
from keras.layers import Dense, LSTM, Dropout, Conv1D, RepeatVector, TimeDistributed
from keras.callbacks import ModelCheckpoint, EarlyStopping
from keras.wrappers.scikit_learn import KerasRegressor

# Common imports
import os
import timeit
import numpy as np
import pandas as pd
import seaborn as sns
from math import sqrt
from datetime import date
import holidays
sns.set()
import warnings
warnings.filterwarnings("ignore")

# To plot pretty figures
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
mpl.rcParams.update(mpl.rcParamsDefault)
mpl.rcParams["font.family"] = "serif"
mpl.rcParams["font.sans-serif"] = "Verdana"
mpl.rcParams["lines.markersize"] = 20
# mpl.rc('axes', labelsize=14)
# mpl.rc('xtick', labelsize=12)
# mpl.rc('ytick', labelsize=12)

In [2]:
# Read LAD+incident file if have not run from the beginning
df = pd.read_csv("data/LAD+incident.csv", header=0)

## 1. Summary <a class="anchor" id="1"></a>

In [3]:
df

Unnamed: 0,ID,DateTime,Length,Volume,Speed,Occupancy,Incident_type
0,1,2018-01-01 00:00:00,960.0,7.0,96.000000,1.0,
1,1,2018-01-01 00:01:00,960.0,6.0,94.999998,1.0,
2,1,2018-01-01 00:02:00,960.0,5.0,90.999999,1.0,
3,1,2018-01-01 00:03:00,960.0,5.0,94.999997,1.0,
4,1,2018-01-01 00:04:00,960.0,5.0,92.999999,1.0,
...,...,...,...,...,...,...,...
6007675,14,2018-10-25 23:55:00,567.0,6.0,82.999997,0.0,
6007676,14,2018-10-25 23:56:00,567.0,6.0,78.000003,0.0,
6007677,14,2018-10-25 23:57:00,567.0,5.0,78.000001,0.0,
6007678,14,2018-10-25 23:58:00,567.0,6.0,74.000000,0.0,
