# Data Scientist Nanodegree
## Introduction to Data Science
## Project: Write Date Science blog post

#### Roads safety is pressing concern for many countries, where road crash fatalities and disabilities is gradually being recognized as a major public health concern. It is our social responsibility to be aware of our surrounding and use science and knowledge to help solve real-world problems…

>According to World Health Organization (WHO); nearly 1.25 million people die in road crashes each year, on average 3,287 deaths a day. In addition, road traffic crashes rank as the 9th leading cause of death and account for 2.2% of all deaths globally.



## Project Motiviation:

###### The aim of this project is to use data science methodology and machine learning to gain an understanding on the problem at hand, and develop insights and prevention mechanisms for Traffic Accidents and Road Safety. 


This project will use U.K Road Safety Data from (2005–2017). The dataset is published by Department of Trasports, under Open Government Licence. The data consists of detailed road safety data about the circumstances of personal injury road accidents, the types of vehicles involved and the consequential casualties.


First off, we want to study and understand the nature of car accidents, and how it has changed throughout the years? how road safety has developed over time? where do accidents happen? what are the main causes of car crashes? whether we can predict the severity of accidents and prevent them before they happen?

This project will follow CRISP-DM methdology which provides a structed process to approach data science problems, it's constructed of 6 steps: 
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment 



## Business Understanding and Analysis Questions:

#### To better guide us through the analysis, I formulated the problem into the following set of questions, so we can explore it at greater depth: 
1. What is the severity of accidents over the last decade?  
2. When do accidents usually happen?
3. Where do cyclists accidents usually happen?
4. Under which circumstances do accidents happen? Is there any correlation between these features? 
5. What is the age distribution of drivers involved in the accidents?
6. What are the characteristics of casualties impacted in the accidents?
7. What are the main factors causing an accidents, and can we predict the severity based on these factors?


## Data Undertstanding and Analysis: 

In this section we will import libraries and packages necessry for this project, we will be using: Pandas, NumPy, Sklearn, Plotly, and Folimm. After that we will load our datasets and start looking into the data: 

In [1]:
# Import libraries necessary for this project

# Pandas, and NumPy: 
import numpy as np
import pandas as pd
from time import time
import math

# Plotly: 
import plotly.plotly as py
import plotly
import plotly.graph_objs as go
from plotly import tools
import csv
import folium
import matplotlib.pyplot as pl
import matplotlib.patches as mpatches

# sklearn: 
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.model_selection import train_test_split
from sklearn.metrics import fbeta_score
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier

# folium: 
from folium import plugins
from folium.plugins import HeatMap

# dython: 
from dython.nominal import associations

import colorlover as cl
from IPython.display import HTML
from IPython.display import display

# Pretty display for notebooks
%matplotlib inline

In [2]:
# Configurations:
plotly.tools.set_credentials_file(username='rawanmm', api_key='9TIS4IOtNmEoHGTadSEh')
plotly.offline.init_notebook_mode(connected=True)
pd.options.display.max_columns = None
bupu = cl.scales['9']['seq']['BuPu']
HTML( cl.to_html(bupu))

#### Data loading and preprocessing: 

To start the analysis we will load the data which consists of three files (Accidents, Vehicles, and Casualties): 
* Accidents(0514-2017).csv: detailed road safety data about the circumstances of personal injury road accidents in GB, indexed by (Accident_Index). 
* Vehicles(0514-2017).csv: detailed vehicles involved in traffic accidents, it can be linked to accidents using (Accident_Index).
* Casualties(0514-2017).csv: detailed consequential casualties invloved in traffic accidents, it can be linked to accidents using (Accident_Index).

The data is separated based on timestamp so will concatenate them into one dataset. We will follow that with assessment of any necessary preprocessing, transformation or cleaning.



In [4]:
# Load datasets
# 2005 - 2014
accidents_data_0514 = pd.read_csv("dft-accident-data/All_Data/Accidents_0514.csv", parse_dates=['Date'], date_parser = pd.to_datetime)
vehicles_data_0514 = pd.read_csv("dft-accident-data/All_Data/Vehicles_0514.csv")
casualties_data_0514 = pd.read_csv("dft-accident-data/All_Data/Casualties_0514.csv")
# 2015
accidents_data_2015 = pd.read_csv("dft-accident-data/All_Data/Accidents_2015.csv", parse_dates=['Date'], date_parser = pd.to_datetime)
vehicles_data_2015 = pd.read_csv("dft-accident-data/All_Data/Vehicles_2015.csv")
casualties_data_2015 = pd.read_csv("dft-accident-data/All_Data/Casualties_2015.csv")
# 2016
accidents_data_2016 = pd.read_csv("dft-accident-data/All_Data/Accidents_2016.csv", parse_dates=['Date'], date_parser = pd.to_datetime)
vehicles_data_2016 = pd.read_csv("dft-accident-data/All_Data/Vehicles_2016.csv")
casualties_data_2016 = pd.read_csv("dft-accident-data/All_Data/Casualties_2016.csv")
# 2017
accidents_data_2017 = pd.read_csv("dft-accident-data/All_Data/Accidents_2017.csv", parse_dates=['Date'], date_parser = pd.to_datetime)
vehicles_data_2017 = pd.read_csv("dft-accident-data/All_Data/Vehicles_2017.csv")
casualties_data_2017 = pd.read_csv("dft-accident-data/All_Data/Casualties_2017.csv")


Columns (31) have mixed types. Specify dtype option on import or set low_memory=False.


Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.



In [5]:
# Column is missing from old dataset, 
# so to reslove this and avoid running into error colunms will be added and set to (-1) to be handled later
vehicles_data_0514['Vehicle_IMD_Decile']= -1 
casualties_data_0514['Casualty_IMD_Decile'] = -1

In [6]:
# concat all datasets into one: 
accidents_data = pd.concat([accidents_data_0514, accidents_data_2015, accidents_data_2016, accidents_data_2017], ignore_index=True)
vehicles_data = pd.concat([vehicles_data_0514, vehicles_data_2015, vehicles_data_2016, vehicles_data_2017], ignore_index=True)
casualties_data = pd.concat([casualties_data_0514, casualties_data_2015, casualties_data_2016, casualties_data_2017], ignore_index=True)

In [7]:
display(accidents_data.head())

Unnamed: 0,Accident_Index,Location_Easting_OSGR,Location_Northing_OSGR,Longitude,Latitude,Police_Force,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),Local_Authority_(Highway),1st_Road_Class,1st_Road_Number,Road_Type,Speed_limit,Junction_Detail,Junction_Control,2nd_Road_Class,2nd_Road_Number,Pedestrian_Crossing-Human_Control,Pedestrian_Crossing-Physical_Facilities,Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Special_Conditions_at_Site,Carriageway_Hazards,Urban_or_Rural_Area,Did_Police_Officer_Attend_Scene_of_Accident,LSOA_of_Accident_Location
0,200501BS00001,525680.0,178240.0,-0.19117,51.489096,1,2,1,1,2005-04-01,3,17:42,12,E09000020,3,3218,6,30.0,0,-1,-1,0,0,1,1,2,2,0,0,1,1,E01002849
1,200501BS00002,524170.0,181650.0,-0.211708,51.520075,1,3,1,1,2005-05-01,4,17:36,12,E09000020,4,450,3,30.0,6,2,5,0,0,5,4,1,1,0,0,1,1,E01002909
2,200501BS00003,524520.0,182240.0,-0.206458,51.525301,1,3,2,1,2005-06-01,5,00:15,12,E09000020,5,0,6,30.0,0,-1,-1,0,0,0,4,1,1,0,0,1,1,E01002857
3,200501BS00004,526900.0,177530.0,-0.173862,51.482442,1,3,1,1,2005-07-01,6,10:35,12,E09000020,3,3220,6,30.0,0,-1,-1,0,0,0,1,1,1,0,0,1,1,E01002840
4,200501BS00005,528060.0,179040.0,-0.156618,51.495752,1,3,1,1,2005-10-01,2,21:13,12,E09000020,6,0,6,30.0,0,-1,-1,0,0,0,7,1,2,0,0,1,1,E01002863


In [8]:
accidents_data.info()  

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2047256 entries, 0 to 2047255
Data columns (total 32 columns):
Accident_Index                                 object
Location_Easting_OSGR                          float64
Location_Northing_OSGR                         float64
Longitude                                      float64
Latitude                                       float64
Police_Force                                   int64
Accident_Severity                              int64
Number_of_Vehicles                             int64
Number_of_Casualties                           int64
Date                                           datetime64[ns]
Day_of_Week                                    int64
Time                                           object
Local_Authority_(District)                     int64
Local_Authority_(Highway)                      object
1st_Road_Class                                 int64
1st_Road_Number                                int64
Road_Type          

##### Accidents Data:
 
The data consists of many variables describing an accidents, it can be generalized into these main features (location, condition of accidents, timestamp data, severity and effected vehicles and casualties). From what we can see all features are coded as number, in order to make the most of the data we will map it according to provided lookup tables.

In [9]:
display(vehicles_data.head())

Unnamed: 0,Accident_Index,Vehicle_Reference,Vehicle_Type,Towing_and_Articulation,Vehicle_Manoeuvre,Vehicle_Location-Restricted_Lane,Junction_Location,Skidding_and_Overturning,Hit_Object_in_Carriageway,Vehicle_Leaving_Carriageway,Hit_Object_off_Carriageway,1st_Point_of_Impact,Was_Vehicle_Left_Hand_Drive?,Journey_Purpose_of_Driver,Sex_of_Driver,Age_of_Driver,Age_Band_of_Driver,Engine_Capacity_(CC),Propulsion_Code,Age_of_Vehicle,Driver_IMD_Decile,Driver_Home_Area_Type,Vehicle_IMD_Decile
0,200501BS00001,1,9,0,18,0,0,0,0,0,0,1,1,15,2,74,10,-1,-1,-1,7,1,-1
1,200501BS00002,1,11,0,4,0,3,0,0,0,0,4,1,1,1,42,7,8268,2,3,-1,-1,-1
2,200501BS00003,1,11,0,17,0,0,0,4,0,0,4,1,1,1,35,6,8300,2,5,2,1,-1
3,200501BS00003,2,9,0,2,0,0,0,0,0,0,3,1,15,1,62,9,1762,1,6,1,1,-1
4,200501BS00004,1,9,0,18,0,0,0,0,0,0,1,1,15,2,49,8,1769,1,4,2,1,-1


In [10]:
vehicles_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3753696 entries, 0 to 3753695
Data columns (total 23 columns):
Accident_Index                      object
Vehicle_Reference                   int64
Vehicle_Type                        int64
Towing_and_Articulation             int64
Vehicle_Manoeuvre                   int64
Vehicle_Location-Restricted_Lane    int64
Junction_Location                   int64
Skidding_and_Overturning            int64
Hit_Object_in_Carriageway           int64
Vehicle_Leaving_Carriageway         int64
Hit_Object_off_Carriageway          int64
1st_Point_of_Impact                 int64
Was_Vehicle_Left_Hand_Drive?        int64
Journey_Purpose_of_Driver           int64
Sex_of_Driver                       int64
Age_of_Driver                       int64
Age_Band_of_Driver                  int64
Engine_Capacity_(CC)                int64
Propulsion_Code                     int64
Age_of_Vehicle                      int64
Driver_IMD_Decile                   int64
Driv

##### Vehicles Data:
 
The data consists of many variables describing an vehicle involded in an accident, an accident can be linked to more than one vehicle. The data can be generalized into these main features (driver data, condition and features of vehicle). Similar to Accidents Data; all features are coded as number, in order to make the most of the data we will map it according to provided lookup tables. 

In [11]:
display(casualties_data.head())

Unnamed: 0,Accident_Index,Vehicle_Reference,Casualty_Reference,Casualty_Class,Sex_of_Casualty,Age_of_Casualty,Age_Band_of_Casualty,Casualty_Severity,Pedestrian_Location,Pedestrian_Movement,Car_Passenger,Bus_or_Coach_Passenger,Pedestrian_Road_Maintenance_Worker,Casualty_Type,Casualty_Home_Area_Type,Casualty_IMD_Decile
0,200501BS00001,1,1,3,1,37,7,2,1,1,0,0,-1,0,1,-1
1,200501BS00002,1,1,2,1,37,7,3,0,0,0,4,-1,11,1,-1
2,200501BS00003,2,1,1,1,62,9,3,0,0,0,0,-1,9,1,-1
3,200501BS00004,1,1,3,1,30,6,3,5,2,0,0,-1,0,1,-1
4,200501BS00005,1,1,1,1,49,8,3,0,0,0,0,-1,3,-1,-1


In [12]:
casualties_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2755286 entries, 0 to 2755285
Data columns (total 16 columns):
Accident_Index                        object
Vehicle_Reference                     int64
Casualty_Reference                    int64
Casualty_Class                        int64
Sex_of_Casualty                       int64
Age_of_Casualty                       int64
Age_Band_of_Casualty                  int64
Casualty_Severity                     int64
Pedestrian_Location                   int64
Pedestrian_Movement                   int64
Car_Passenger                         int64
Bus_or_Coach_Passenger                int64
Pedestrian_Road_Maintenance_Worker    int64
Casualty_Type                         int64
Casualty_Home_Area_Type               int64
Casualty_IMD_Decile                   int64
dtypes: int64(15), object(1)
memory usage: 336.3+ MB


##### Casualties Data:
 
The data consists of many variables describing an casualties involded in an accident, an accident can be linked to more than one casualty. The data can be generalized into these main features (casuality data, condition of casualtie, data related to spesific casuality groub). Similar to Accidents Data; all features are coded as number, in order to make the most of the data we will map it according to provided lookup tables. 

## Data Preprocessing: 
Since many features are coded into numbers, we will transform them to their textual strings, using provided lookup mapping tables. We will load lookup mapping file, where for each variable we will have seperate file that has the appropriate encoding. 

In [13]:
lookup_mapping = pd.read_csv('dft-accident-data/lookup_mapping.csv', header=None, index_col=0, squeeze=True).to_dict()

In [14]:
lookup_mapping

{'1st_Point_of_Impact': '1st_Point_of_Impact.csv',
 '1st_Road_Class': '1st_Road_Class.csv',
 '2nd_Road_Class': '2nd_Road_Class.csv',
 'Accident_Severity': 'Accident_Severity.csv',
 'Age_Band_of_Casualty': 'Age_Band.csv',
 'Age_Band_of_Driver': 'Age_Band.csv',
 'Bus_or_Coach_Passenger': 'Bus_Passenger.csv',
 'Car_Passenger': 'Car_Passenger.csv',
 'Carriageway_Hazards': 'Carriageway_Hazards.csv',
 'Casualty_Class': 'Casualty_Class.csv',
 'Casualty_Home_Area_Type': 'Home_Area_Type.csv',
 'Casualty_IMD_Decile': 'IMD_Decile.csv',
 'Casualty_Severity': 'Casualty_Severity.csv',
 'Casualty_Type': 'Casualty_Type.csv',
 'Day_of_Week': 'Day_of_Week.csv',
 'Did_Police_Officer_Attend_Scene_of_Accident': 'Police_Officer_Attend.csv',
 'Driver_Home_Area_Type': 'Home_Area_Type.csv',
 'Driver_IMD_Decile': 'IMD_Decile.csv',
 'Hit_Object_in_Carriageway': 'Hit_Object_in_Carriageway.csv',
 'Hit_Object_off_Carriageway': 'Hit_Object_Off_Carriageway.csv',
 'Journey_Purpose_of_Driver': 'Journey_Purpose.csv',
 '

In [15]:
# for all features, (-1) represent a missing data point, to better handle it with pandas, it will be replaced to NaN: 
accidents_data.replace(-1, 'NaN', inplace=True)
casualties_data.replace(-1, 'NaN', inplace=True)
vehicles_data.replace(-1, 'NaN', inplace=True)

In [16]:
# method to iterate through the lookup list to load the nessecery file for each variable and update values accordingly: 
def update_lookup_value (df, col): 
    value = lookup_mapping.get(col, -1)
    if (value != -1): 
        lookup_data = pd.read_csv('dft-accident-data/Road-Accident-Safety-Data-Guide/'+value, header=None, index_col=0, squeeze=True).to_dict()
        df[col] = df[col].astype(str).map(lookup_data)

In [17]:
# map coded features to string: 
for col in accidents_data:
    update_lookup_value(accidents_data, col)

for col in casualties_data:
    update_lookup_value(casualties_data, col)

for col in vehicles_data:
    update_lookup_value(vehicles_data, col)

##### Feature Transformation: 
we used timestamp feature to drive more features; such as, Year, Month, Day of the week. In addition, we will link some feature from Vehicles dataset to the main Accidents dataset (which will be used later on in building a predictive model). Since the relation is one:many, I used my intuition and decided to only link the first vehicle involved in the accident and treated it as the main vehicle causing the accident.

In [18]:
# time series features: 
accidents_data['Year'] = accidents_data.Date.dt.year 
accidents_data['Month_number'] = accidents_data.Date.dt.month
accidents_data['Month'] = accidents_data.Date.dt.month_name()
accidents_data['Hour'] = accidents_data.apply(lambda x: str(x.Time).split(':')[0], axis=1)

In [19]:
# merge vehicle data using key (Accident_Index)
accidents_data = pd.merge(accidents_data,vehicles_data[['Accident_Index','Vehicle_Type','Sex_of_Driver', 'Age_of_Driver', 'Age_Band_of_Driver','Vehicle_Manoeuvre']],on='Accident_Index', how='left')


In [20]:
# Data information after merging: 
print ('accidents_data before cleanup:', accidents_data.shape[0])
print ('casualties_data before cleanup:', casualties_data.shape[0])
print ('vehicles_data before cleanup:', vehicles_data.shape[0])

accidents_data before cleanup: 3723845
casualties_data before cleanup: 2755286
vehicles_data before cleanup: 3753696


In [21]:
# remove dublicate 
accidents_data.drop_duplicates(subset='Accident_Index', keep='first', inplace=True)

In [22]:
print ('accidents_data drop duplicates:', accidents_data.shape[0])
print ('casualties_data drop duplicates:', casualties_data.shape[0])
print ('vehicles_data drop duplicates:', vehicles_data.shape[0])

accidents_data drop duplicates: 2047256
casualties_data drop duplicates: 2755286
vehicles_data drop duplicates: 3753696


#####  Data Cleaning: 
we handled missing values by dropping un-useful columns (missing more that 30% of data). Due to size of dataset and percentage of rows with missing main features (location, time, accidents characteristics 'speed limit, road type, accident severity, etc..'), rather than imputing missing values, I decided to drop them and their linked vehicles/casualties to ensure the consistency of data.

In [23]:
accidents_data_cleanup = ['Longitude','Time','Speed_limit', 'Junction_Detail', 'Light_Conditions', 'Road_Type',
                          'Did_Police_Officer_Attend_Scene_of_Accident','Weather_Conditions', 
                          'Road_Surface_Conditions', 'Special_Conditions_at_Site','Carriageway_Hazards', 
                          'Urban_or_Rural_Area', 'Accident_Severity', '1st_Road_Class', 
                          'Pedestrian_Crossing-Human_Control', 
                          'Pedestrian_Crossing-Physical_Facilities', 
                          'Police_Force','Vehicle_Type', 'Sex_of_Driver', 'Age_of_Driver', 
                          'Vehicle_Manoeuvre']

accidents_data_cleanup_outliers = ['Sex_of_Driver', 'Age_Band_of_Driver']
outliers_list = ['6 - 10', '0 - 5', 'Not known']

In [24]:
# method to clean spesific features, and remove linked data points in other datasets: 
def clean_data (cleanup_features, df, df2, df3): 
    for feature in cleanup_features: 
        accidents_data_list = df[(df[feature].isnull()) | (df[feature] == 'NaN')].Accident_Index.tolist()
        df = df[~df.Accident_Index.isin(accidents_data_list)]
        df2 = df2[~df2.Accident_Index.isin(accidents_data_list)]
        df3 = df3[~df3.Accident_Index.isin(accidents_data_list)]
    return df, df2, df3

In [25]:
# method to remove outliers from spesific features, and remove linked data points in other datasets: 
def clean_outliers (cleanup_outliers, outliers_list, df, df2, df3): 
    for feature in cleanup_outliers: 
        accidents_data_list = df[(df[feature].isin (outliers_list))].Accident_Index.tolist()
        df = df[~df.Accident_Index.isin(accidents_data_list)]
        df2 = df2[~df2.Accident_Index.isin(accidents_data_list)]
        df3 = df3[~df3.Accident_Index.isin(accidents_data_list)]
    return df, df2, df3

In [26]:
accidents_data.isnull().sum(axis = 0)

Accident_Index                                      0
Location_Easting_OSGR                             164
Location_Northing_OSGR                            164
Longitude                                         174
Latitude                                          174
Police_Force                                        0
Accident_Severity                                   0
Number_of_Vehicles                                  0
Number_of_Casualties                                0
Date                                                0
Day_of_Week                                         0
Time                                              156
Local_Authority_(District)                          0
Local_Authority_(Highway)                           0
1st_Road_Class                                      0
1st_Road_Number                                     0
Road_Type                                           1
Speed_limit                                        37
Junction_Detail             

In [27]:

# clean data to remove rows with missing values/outliers and their linked vehicles/casualties data
accidents_data, casualties_data, vehicles_data = clean_data(accidents_data_cleanup, accidents_data, casualties_data, vehicles_data)
accidents_data, casualties_data, vehicles_data = clean_outliers(accidents_data_cleanup_outliers, outliers_list, accidents_data, casualties_data, vehicles_data)



elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison



In [28]:
print ('accidents_data final cleanup:', accidents_data.shape[0])
print ('casualties_data final cleanup:', casualties_data.shape[0])
print ('vehicles_data final cleanup:', vehicles_data.shape[0])

accidents_data final cleanup: 1787487
casualties_data final cleanup: 2482684
vehicles_data final cleanup: 3351315


In [29]:
accidents_data.isnull().sum(axis = 0)/accidents_data.shape[0]*100

Accident_Index                                  0.000000
Location_Easting_OSGR                           0.000000
Location_Northing_OSGR                          0.000000
Longitude                                       0.000000
Latitude                                        0.000000
Police_Force                                    0.000000
Accident_Severity                               0.000000
Number_of_Vehicles                              0.000000
Number_of_Casualties                            0.000000
Date                                            0.000000
Day_of_Week                                     0.000000
Time                                            0.000000
Local_Authority_(District)                      0.000000
Local_Authority_(Highway)                       0.000000
1st_Road_Class                                  0.000000
1st_Road_Number                                 0.000000
Road_Type                                       0.000000
Speed_limit                    

In [30]:
accidents_data.drop(["LSOA_of_Accident_Location", "2nd_Road_Class", "Junction_Control"], axis=1, inplace=True)

In [31]:
casualties_data.isnull().sum(axis = 0)/casualties_data.shape[0]*100

Accident_Index                         0.000000
Vehicle_Reference                      0.000000
Casualty_Reference                     0.000000
Casualty_Class                         0.000000
Sex_of_Casualty                        0.018407
Age_of_Casualty                        0.000000
Age_Band_of_Casualty                   1.383342
Casualty_Severity                      0.000000
Pedestrian_Location                    0.000443
Pedestrian_Movement                    0.000725
Car_Passenger                          0.056068
Bus_or_Coach_Passenger                 0.004833
Pedestrian_Road_Maintenance_Worker    52.444451
Casualty_Type                          0.000040
Casualty_Home_Area_Type               14.335977
Casualty_IMD_Decile                   85.283870
dtype: float64

In [32]:
casualties_data.drop(["Casualty_IMD_Decile", "Pedestrian_Road_Maintenance_Worker"], axis=1, inplace=True)

In [33]:
vehicles_data.isnull().sum(axis = 0)/vehicles_data.shape[0]*100

Accident_Index                       0.000000
Vehicle_Reference                    0.000000
Vehicle_Type                         0.007878
Towing_and_Articulation              0.039716
Vehicle_Manoeuvre                    0.019962
Vehicle_Location-Restricted_Lane     0.020529
Junction_Location                   28.235245
Skidding_and_Overturning             0.027392
Hit_Object_in_Carriageway            0.025035
Vehicle_Leaving_Carriageway          0.021991
Hit_Object_off_Carriageway           0.010623
1st_Point_of_Impact                  0.040074
Was_Vehicle_Left_Hand_Drive?         0.717032
Journey_Purpose_of_Driver            1.201260
Sex_of_Driver                        0.001581
Age_of_Driver                        0.000000
Age_Band_of_Driver                   5.679591
Engine_Capacity_(CC)                 0.000000
Propulsion_Code                     22.550312
Age_of_Vehicle                       0.000000
Driver_IMD_Decile                   30.590649
Driver_Home_Area_Type             

In [34]:
vehicles_data.drop(["Driver_IMD_Decile", "Vehicle_IMD_Decile"], axis=1, inplace=True)

In [35]:
accidents_data.head()

Unnamed: 0,Accident_Index,Location_Easting_OSGR,Location_Northing_OSGR,Longitude,Latitude,Police_Force,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),Local_Authority_(Highway),1st_Road_Class,1st_Road_Number,Road_Type,Speed_limit,Junction_Detail,2nd_Road_Number,Pedestrian_Crossing-Human_Control,Pedestrian_Crossing-Physical_Facilities,Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Special_Conditions_at_Site,Carriageway_Hazards,Urban_or_Rural_Area,Did_Police_Officer_Attend_Scene_of_Accident,Year,Month_number,Month,Hour,Vehicle_Type,Sex_of_Driver,Age_of_Driver,Age_Band_of_Driver,Vehicle_Manoeuvre
0,200501BS00001,525680.0,178240.0,-0.19117,51.489096,Metropolitan Police,Serious,1,1,2005-04-01,Tuesday,17:42,Kensington and Chelsea,Kensington and Chelsea,A,3218,Single carriageway,30.0,Not at junction or within 20 metres,0,None within 50 metres,Zebra,Daylight,Raining no high winds,Wet or damp,,,Urban,Yes,2005,4,April,17,Car,Female,74,66 - 75,Going ahead other
1,200501BS00002,524170.0,181650.0,-0.211708,51.520075,Metropolitan Police,Slight,1,1,2005-05-01,Wednesday,17:36,Kensington and Chelsea,Kensington and Chelsea,B,450,Dual carriageway,30.0,Crossroads,0,None within 50 metres,Pedestrian phase at traffic signal junction,Darkness - lights lit,Fine no high winds,Dry,,,Urban,Yes,2005,5,May,17,Bus or coach (17 or more pass seats),Male,42,36 - 45,Slowing or stopping
2,200501BS00003,524520.0,182240.0,-0.206458,51.525301,Metropolitan Police,Slight,2,1,2005-06-01,Thursday,00:15,Kensington and Chelsea,Kensington and Chelsea,C,0,Single carriageway,30.0,Not at junction or within 20 metres,0,None within 50 metres,No physical crossing facilities within 50 metres,Darkness - lights lit,Fine no high winds,Dry,,,Urban,Yes,2005,6,June,0,Bus or coach (17 or more pass seats),Male,35,26 - 35,Going ahead right-hand bend
4,200501BS00004,526900.0,177530.0,-0.173862,51.482442,Metropolitan Police,Slight,1,1,2005-07-01,Friday,10:35,Kensington and Chelsea,Kensington and Chelsea,A,3220,Single carriageway,30.0,Not at junction or within 20 metres,0,None within 50 metres,No physical crossing facilities within 50 metres,Daylight,Fine no high winds,Dry,,,Urban,Yes,2005,7,July,10,Car,Female,49,46 - 55,Going ahead other
5,200501BS00005,528060.0,179040.0,-0.156618,51.495752,Metropolitan Police,Slight,1,1,2005-10-01,Monday,21:13,Kensington and Chelsea,Kensington and Chelsea,Unclassified,0,Single carriageway,30.0,Not at junction or within 20 metres,0,None within 50 metres,No physical crossing facilities within 50 metres,Darkness - lighting unknown,Fine no high winds,Wet or damp,,,Urban,Yes,2005,10,October,21,Motorcycle 125cc and under,Male,49,46 - 55,Going ahead other


In [38]:
accidents_data.shape

(1787487, 38)

In [36]:
vehicles_data.head()

Unnamed: 0,Accident_Index,Vehicle_Reference,Vehicle_Type,Towing_and_Articulation,Vehicle_Manoeuvre,Vehicle_Location-Restricted_Lane,Junction_Location,Skidding_and_Overturning,Hit_Object_in_Carriageway,Vehicle_Leaving_Carriageway,Hit_Object_off_Carriageway,1st_Point_of_Impact,Was_Vehicle_Left_Hand_Drive?,Journey_Purpose_of_Driver,Sex_of_Driver,Age_of_Driver,Age_Band_of_Driver,Engine_Capacity_(CC),Propulsion_Code,Age_of_Vehicle,Driver_Home_Area_Type
0,200501BS00001,1,Car,No tow/articulation,Going ahead other,On main c'way - not in restricted lane,Not at junction or within 20 metres,,,Did not leave carriageway,,Front,No,Other/Not known (2005-10),Female,74,66 - 75,,,,Urban area
1,200501BS00002,1,Bus or coach (17 or more pass seats),No tow/articulation,Slowing or stopping,On main c'way - not in restricted lane,Stop sign,,,Did not leave carriageway,,Nearside,No,Journey as part of work,Male,42,36 - 45,8268.0,Heavy oil,3.0,
2,200501BS00003,1,Bus or coach (17 or more pass seats),No tow/articulation,Going ahead right-hand bend,On main c'way - not in restricted lane,Not at junction or within 20 metres,,Parked vehicle,Did not leave carriageway,,Nearside,No,Journey as part of work,Male,35,26 - 35,8300.0,Heavy oil,5.0,Urban area
3,200501BS00003,2,Car,No tow/articulation,Parked,On main c'way - not in restricted lane,Not at junction or within 20 metres,,,Did not leave carriageway,,Offside,No,Other/Not known (2005-10),Male,62,56 - 65,1762.0,Petrol,6.0,Urban area
4,200501BS00004,1,Car,No tow/articulation,Going ahead other,On main c'way - not in restricted lane,Not at junction or within 20 metres,,,Did not leave carriageway,,Front,No,Other/Not known (2005-10),Female,49,46 - 55,1769.0,Petrol,4.0,Urban area


In [39]:
vehicles_data.shape

(3351315, 21)

In [37]:
casualties_data.head()

Unnamed: 0,Accident_Index,Vehicle_Reference,Casualty_Reference,Casualty_Class,Sex_of_Casualty,Age_of_Casualty,Age_Band_of_Casualty,Casualty_Severity,Pedestrian_Location,Pedestrian_Movement,Car_Passenger,Bus_or_Coach_Passenger,Casualty_Type,Casualty_Home_Area_Type
0,200501BS00001,1,1,Pedestrian,Male,37,36 - 45,Serious,Crossing on pedestrian crossing facility,Crossing from driver's nearside,Not car passenger,Not a bus or coach passenger,Pedestrian,Urban area
1,200501BS00002,1,1,Passenger,Male,37,36 - 45,Slight,Not a Pedestrian,Not a Pedestrian,Not car passenger,Seated passenger,Bus or coach occupant (17 or more pass seats),Urban area
2,200501BS00003,2,1,Driver or rider,Male,62,56 - 65,Slight,Not a Pedestrian,Not a Pedestrian,Not car passenger,Not a bus or coach passenger,Car occupant,Urban area
3,200501BS00004,1,1,Pedestrian,Male,30,26 - 35,Slight,"In carriageway, crossing elsewhere",Crossing from nearside - masked by parked or s...,Not car passenger,Not a bus or coach passenger,Pedestrian,Urban area
4,200501BS00005,1,1,Driver or rider,Male,49,46 - 55,Slight,Not a Pedestrian,Not a Pedestrian,Not car passenger,Not a bus or coach passenger,Motorcycle 125cc and under rider or passenger,


In [40]:
casualties_data.shape

(2482684, 14)

## Exploratory Data Analysis: 

> Note: all graphs in this analysis are interactive made with Plot.ly, feel free to play around with data yourself :)

In this section we will dig into the data to answer questions related to the nature and characteristics of accidents, we will do necessary manipulatation of data in order to get more insights: 

**1- What is the severity of accidents over the last decade?**

In this question we want to see how the accidents and road safety developed thoughout the years, we will add another dimension to the data by incorporating severity of accidents: 

In [42]:
# Sevirety of accident:  

x = accidents_data.Accident_Severity.groupby([accidents_data.Year]).count().index

trace_3 = go.Bar(
            x=x,
            y=accidents_data[accidents_data.Accident_Severity == 'Fatal'].Accident_Severity.groupby([accidents_data.Year]).count(),
            name='Fatal',
            marker=dict(
            color=bupu[8]))

trace_2 = go.Bar(
            x=x,
            y=accidents_data[accidents_data.Accident_Severity == 'Serious'].Accident_Severity.groupby([accidents_data.Year]).count(),
            name='Serious',
            marker=dict(
            color=bupu[4]))

trace_1 = go.Bar(
            x=x,
            y=accidents_data[accidents_data.Accident_Severity == 'Slight'].Accident_Severity.groupby([accidents_data.Year]).count(),
            name='Slight',
            marker=dict(
            color=bupu[3]))

trace_4 = go.Scatter(
    x = x,
    y = accidents_data.Accident_Severity.groupby([accidents_data.Year]).count(), 
    name = 'Total per year', 
    mode = 'lines',
    legendgroup= 'group2', 
    marker=dict(
            color=bupu[6])
)

layout = go.Layout(title="Accidents Severity Per Year", 
                        barmode='group', 
                        xaxis = dict(ticks='', nticks=24, 
                                    title=go.layout.xaxis.Title(
                                    text='Year')),
                        yaxis = dict(title=go.layout.yaxis.Title(
                                    text='Count')))

fig = go.Figure(data=[trace_1, trace_2, trace_3, trace_4], layout=layout)
py.iplot(fig, filename='Accidents_Severity_Per_Year')

##### Answer: 
this simple graph visualizes the changes in accident severity and distribution over the years. Slight accident is the most frequent type, and overall we can see the count is decreasing, except for two spikes in (2014, 2016).

#####  2- When do accidents usually happen?

In order to answer this question we will take advantage of time series features, from which we have engineered new features, here will also look at data in respect to its severity: 

In [41]:
# accidents_data_month: 

accidents_data_month = accidents_data.groupby('Date')['Month'].count().reset_index()
accidents_data_month = accidents_data_month.set_index('Date')
accidents_data_month = accidents_data_month['Month'].resample('MS').sum()

accidents_data_month_fatal = accidents_data[accidents_data.Accident_Severity == 'Fatal'].groupby('Date')['Month'].count().reset_index()
accidents_data_month_fatal = accidents_data_month_fatal.set_index('Date')
accidents_data_month_fatal = accidents_data_month_fatal['Month'].resample('MS').sum()

accidents_data_month_serious = accidents_data[accidents_data.Accident_Severity == 'Serious'].groupby('Date')['Month'].count().reset_index()
accidents_data_month_serious = accidents_data_month_serious.set_index('Date')
accidents_data_month_serious = accidents_data_month_serious['Month'].resample('MS').sum()

accidents_data_month_slight = accidents_data[accidents_data.Accident_Severity == 'Slight'].groupby('Date')['Month'].count().reset_index()
accidents_data_month_slight = accidents_data_month_slight.set_index('Date')
accidents_data_month_slight = accidents_data_month_slight['Month'].resample('MS').sum()

trace_1 = go.Scatter(x=accidents_data_month_fatal.index, 
                     y=accidents_data_month_fatal, 
                     name='Fatal', 
                     mode='lines',
                    marker=dict(
                    color=bupu[4]))

trace_2 = go.Scatter(x=accidents_data_month_serious.index, 
                     y=accidents_data_month_serious, 
                     name='Serious', 
                     mode='lines',
                    marker=dict(
                    color=bupu[3]))

trace_3 = go.Scatter(x=accidents_data_month_slight.index, 
                     y=accidents_data_month_slight, 
                     name='Slight', 
                     mode='lines',
                    marker=dict(
                    color=bupu[2]))

trace_4 = go.Scatter(x=accidents_data_month.index, 
                     y=accidents_data_month, 
                     name='Total', 
                     mode='lines+markers',
                    marker=dict(
                    color=bupu[7]))

layout = go.Layout(title="Accidents Trends Per Months", 
                        barmode='overlay', 
                        xaxis = dict(ticks='', nticks=48, 
                                    title=go.layout.xaxis.Title(
                                    text='Date')),
                        yaxis = dict(title=go.layout.yaxis.Title(
                                    text='Count')))


fig = go.Figure(data=[trace_1,trace_2, trace_3, trace_4], layout=layout)
py.iplot(fig, filename='Accidents_Trends_Per_Months')

#### Accidents pattern for months:

The graph above shows accidents count on monthly basis. We can quickly identify interesting trends in this time-series data. February has achieved the least accidents throughout the years. On the other hand, November accounts for the most accidents, followed by a decrease going into the holidays and New Years! Summer months reported average accidents with slight ups and downs.

In [43]:
# accident_table_heat: 

accident_table_heat = accidents_data.Hour.groupby([accidents_data.Day_of_Week, accidents_data.Hour]).count()
accident_table_heat = accident_table_heat.rename_axis(['Hour', 'Day_of_Week']).unstack('Day_of_Week')
day_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
accident_table_heat = accident_table_heat.reindex(day_of_week)

color_scale = [[0.0, 'rgb(247,252,253)'], 
               [0.1111111111111111, 'rgb(224,236,244)'], 
               [0.2222222222222222, 'rgb(224,236,244)'], 
               [0.3333333333333333, 'rgb(191,211,230)'], 
               [0.4444444444444444, 'rgb(158,188,218)'], 
               [0.5555555555555556, 'rgb(140,150,198)'], 
               [0.6666666666666666, 'rgb(140,107,177)'], 
               [0.7777777777777778, 'rgb(136,65,157)'], 
               [0.8888888888888888, 'rgb(129,15,124)'], 
               [0.9999999999999999, 'rgb(69,117,180)'], 
               [1.0, 'rgb(77,0,75)']]

data = [go.Heatmap(
        z=accident_table_heat,
        x=accident_table_heat.columns,
        y=accident_table_heat.index,
        colorscale= color_scale)]

layout = go.Layout(
    title='Accidents Per Day/Hour',
    xaxis = dict(ticks='', nticks=36, title='Hour (24)'),
    yaxis = dict(ticks='', title='Day')
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='Accidents Per Day-Hour')



#### Accidents heat-map by Day-Hours:

Heatmaps are great way to show density at particular condition. In this case, we can see Morning rush-hours, and Evening rush-hours are accounting for more accidents in weekdays. As apposed to Weekends, accidents is centered around afternoon, and evening hours. 

##### 3- Where do cyclists accidents usually happen?


Vehicle of type cyclists (Pedal, Motorcycle) come in second as the most frequented vehicles involved in accidents as well as casualties. An important factor in understanding roads safety is knowing where accidents mostly happen. One way to visual this data on a map with respect to its heat and density, this will give us great insights: 
Note: due to computational limitation I wasn't able to visuals all data points, so I filtered the data for the last 4 years (2014–2017):

In [72]:

# Filter the DF for rows, then columns, then remove NaNs
heat_df = accidents_data[(accidents_data['Vehicle_Type'].isin(['Motorcycle 125cc and under', 'Motorcycle over 500cc',
       'Pedal cycle', 'Motorcycle over 125cc and up to 500cc',]))& 
                         (accidents_data['Accident_Severity'].isin(['Fatal','Serious'])) & 
                         (accidents_data['Year'].isin([2014, 2015, 2016, 2017]))].copy() # Reducing data size so it runs faster

# Ensure you're handing it floats
heat_df['Latitude'] = heat_df['Latitude'].astype(float)
heat_df['Longitude'] = heat_df['Longitude'].astype(float)


heat_df = heat_df[['Latitude', 'Longitude']]
heat_df = heat_df.dropna(axis=0, subset=['Latitude','Longitude'])

# List comprehension to make out list of lists
heat_data = [[row['Latitude'],row['Longitude']] for index, row in heat_df.iterrows()]

In [73]:
# Plot it on the map
traffic_map = folium.Map(location=heat_data[0], zoom_start = 13) 
HeatMap(heat_data).add_to(traffic_map)
traffic_map.save('cycalist_traffic_map.html')

In [74]:
# Display the map
traffic_map

From this map view we can spot some hot areas such as (Piccadilly Circus, Oxford Circus, Covet Garden, Warren Street, ect..)

##### 4- Under which circumstances do accidents happen? Is there any correlation between these features?
As we previously seen, the dataset consists of many variables describing the condition of an accidents such as (weather conditions, road type, speed limit, etc..), we will look into these feature a little closer: 

In [69]:

# accidents_by_speed:  
accidents_by_speed = accidents_data.Speed_limit.groupby([accidents_data.Speed_limit]).count().sort_index()
trace_1 = go.Bar(
            x=accidents_by_speed.index,
            y=accidents_by_speed, 
            name= 'Speed Limit', 
            marker=dict(
             color = bupu[1:]))

# accidents_by_Road_Type:  
accidents_by_road = accidents_data.Road_Type.groupby([accidents_data.Road_Type]).count().sort_values()
trace_2 = go.Bar(
            x=accidents_by_road.index,
            y=accidents_by_road,
            name= 'Road Type', 
            marker=dict(
             color = bupu[1:]))


# accidents_by_urban:  
accidents_by_urban = accidents_data.Urban_or_Rural_Area.groupby([accidents_data.Urban_or_Rural_Area]).count().sort_values()
trace_3 = go.Bar(
            x=accidents_by_urban.index,
            y=accidents_by_urban, 
            name= 'Urban or Rural_Area', 
            marker=dict(
             color = bupu[1:]))


# accidents_by_surface:  
accidents_by_surface = accidents_data.Road_Surface_Conditions.groupby([accidents_data.Road_Surface_Conditions]).count().sort_values()
trace_4 = go.Bar(
            x=accidents_by_surface.index,
            y=accidents_by_surface,
            name='Road Surface Conditions', 
            marker=dict(
             color = bupu[1:]))


# accidents_by_weather:  
accidents_by_weather = accidents_data.Weather_Conditions.groupby([accidents_data.Weather_Conditions]).count().sort_values()
trace_5 = go.Bar(
            x=accidents_by_weather.index,
            y=accidents_by_weather, 
            name= 'Weather Conditions', 
            marker=dict(
             color = bupu[1:]))


# accidents_by_condition:  
accidents_by_special = accidents_data.Special_Conditions_at_Site.groupby([accidents_data.Special_Conditions_at_Site]).count().sort_values()
trace_6 = go.Bar(
            x=accidents_by_special.index,
            y=accidents_by_special, 
            name = 'Special Conditions at Site', 
            marker=dict(
             color = bupu[1:]))


# accidents_by_junction:  
accidents_by_junction = accidents_data.Junction_Detail.groupby([accidents_data.Junction_Detail]).count().sort_values()
trace_7 = go.Bar(
            x=accidents_by_junction.index,
            y=accidents_by_junction, 
            name= 'Junction Detail', 
            marker=dict(
             color = bupu[1:]))

# accidents_by_light:  
accidents_by_light = accidents_data.Light_Conditions.groupby([accidents_data.Light_Conditions]).count().sort_values()
trace_8 = go.Bar(
            x=accidents_by_light.index,
            y=accidents_by_light, 
            name= 'Light Conditions', 
            marker=dict(
             color = bupu[1:]))

In [70]:
# will be used

subplot_titles = ('Speed Limit', 'Road Type','Urban or Rural_Area', 'Road Surface Conditions')
fig = tools.make_subplots(rows=1, cols=4, subplot_titles=subplot_titles)

fig.append_trace(trace_1, 1, 1)
fig.append_trace(trace_2, 1, 2)
fig.append_trace(trace_3, 1, 3)
fig.append_trace(trace_4, 1, 4)

layout = go.Layout(
    #width=500,
    #height=700,
    title='Accidents Characteristics',
    xaxis = dict(ticks='', nticks=36, automargin=True),
    yaxis = dict(ticks='', automargin=True), 
    font=dict(size=10), showlegend=False
)

fig['layout'].update(layout)
py.iplot(fig, filename='Accidents Characteristics_1')

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]  [ (1,3) x3,y3 ]  [ (1,4) x4,y4 ]



What we can notice in these graphs is: 
1. Roads with (30 speed limit) accounts for (~60%) of accidents, I assume this is reasonable since most accidents are categorized as (Slight - fender bender as they say) so it is more likely to happen in roads with low speed limit. 
2. The conditions from high point of view don't yield surprising results, for example; accidents happen more in Dry Surfaces, Daylight, and Clear weather. So we need to visualize the data from different prospective, so in the following graph we will see how these features correlates to each other. 


Here we will show correlation matrix of these features, for the correlation we used (Cramer's V algorithm as described here), it allows us to get matrix for categorical features:

In [50]:
accidents_data_catigorical = ['Speed_limit', 'Junction_Detail', 'Light_Conditions', 'Road_Type',
                              'Weather_Conditions', 'Road_Surface_Conditions', 'Special_Conditions_at_Site',
                             'Carriageway_Hazards', 'Urban_or_Rural_Area', 'Accident_Severity', '1st_Road_Class', 
                              'Pedestrian_Crossing-Human_Control', 'Pedestrian_Crossing-Physical_Facilities', 'Police_Force',
                             'Vehicle_Type', 'Sex_of_Driver', 'Vehicle_Manoeuvre', 'Age_Band_of_Driver', 
                             'Month', 'Local_Authority_(Highway)', 'Local_Authority_(District)', 'Day_of_Week', 'Hour']

In [77]:
# using dython package implementation of Cramer's V algorithm: 
corr = associations(accidents_data[accidents_data_catigorical], theil_u=False, 
                    return_results=True, plot=False, nominal_columns=accidents_data_catigorical)

In [78]:
data = [
    go.Heatmap(
        z=corr,
        x=corr.columns,
        y=corr.index,
        colorscale= color_scale
    )
]

layout = go.Layout(
    height=900,
    title='Correlation between Catigorical Features',
    xaxis = dict(ticks='', nticks=36, automargin=True),
    yaxis = dict(ticks='', automargin=True)
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='corrilation_matrix')

We can spot some interesting correlations for example: Speed Limit & Road Type, Junction Details, Weather & Surface Conditions, Hour and Light Conditions, Police Force and Locations..

##### 5- What is the age distribution of drivers involved in the accidents?
In order to target road safety campaigns effectively, we must understand our targeted audience. 

In [79]:
# vehicles_by_Age_of_Driver and Gender:  

vehicles_by_age_of_driver = accidents_data.Age_of_Driver.groupby([accidents_data.Age_of_Driver]).count().sort_values()

trace_1 = go.Histogram(
            x=accidents_data[accidents_data.Sex_of_Driver == 'Male'].Age_of_Driver, 
            name = 'Male', 
            marker=dict(color = bupu[5]))

trace_2 = go.Histogram(
            x=accidents_data[accidents_data.Sex_of_Driver == 'Female'].Age_of_Driver, 
            name = 'Female', 
            marker=dict(color = bupu[3]))


layout = go.Layout(title="Age of Driver", 
                        barmode='overlay', 
                        xaxis = dict(ticks='', nticks=48, 
                                    title=go.layout.xaxis.Title(
                                    text='Age'), automargin=True),
                        yaxis = dict(title=go.layout.yaxis.Title(
                                    text='Count'), automargin=True))



#plotly.offline.iplot({
#    "data": [trace_1, trace_2],
#    "layout": layout
#})


fig = go.Figure(data=[trace_1, trace_2], layout=layout)
py.iplot(fig, filename='age of driver')


Woah there! Look at all those points! Due to browser limitations, the Plotly SVG drawing functions have a hard time graphing more than 500k data points for line charts, or 40k points for other types of charts. Here are some suggestions:
(1) Use the `plotly.graph_objs.Scattergl` trace object to generate a WebGl graph.
(2) Trying using the image API to return an image instead of a graph URL
(3) Use matplotlib
(4) See if you can create your visualization with fewer data points



Woah there! Look at all those points! Due to browser limitations, the Plotly SVG drawing functions have a hard time graphing more than 500k data points for line charts, or 40k points for other types of charts. Here are some suggestions:
(1) Use the `plotly.graph_objs.Scattergl` trace object to generate a WebGl graph.
(2) Trying using the image API to return an image instead of a graph URL
(3) Use matplotlib
(4) See if you can create your visualization with fewer data points




The age distribution is centered around (18–35), highest point was for (18) years old, followed by (30s). Some strange strikes are found around (25,30, 35, and 40), wonder what is causing that?

##### Vehicles Charactericts: 
In the following graphs we will explore more features related to vehicles: 

In [84]:
# vehicles_by_Vehicle_Type:  

vehicles_by_vehicle_type = vehicles_data.Vehicle_Type.groupby([vehicles_data.Vehicle_Type]).count().sort_values()

trace_1 = go.Bar(
            x=vehicles_by_vehicle_type.index,
            y=vehicles_by_vehicle_type, 
            marker=dict(color = bupu[5]))

layout = go.Layout(
    #width=500,
    #height=700,
    title='Vehicle Type',
    xaxis = dict(ticks='', nticks=36, automargin=True),
    yaxis = dict(ticks='', automargin=True), 
    font=dict(size=10)
)

plotly.offline.iplot({
    "data": [trace_1],
    "layout": layout
})

In [81]:
# vehicles_by_propulsion_code:  

vehicles_by_propulsion_code= vehicles_data.Propulsion_Code.groupby([vehicles_data.Propulsion_Code]).count().sort_values()

trace_1 = go.Bar(
            x=vehicles_by_propulsion_code.index,
            y=vehicles_by_propulsion_code, 
            marker=dict(color = bupu[4]))

layout = go.Layout(
    #width=500,
    #height=700,
    title='Vehicle Type',
    xaxis = dict(ticks='', nticks=36, automargin=True),
    yaxis = dict(ticks='', automargin=True), 
    font=dict(size=10)
)


plotly.offline.iplot({
    "data": [trace_1],
    "layout": layout
})

In [82]:
# vehicles_by_Age_of_Vehicle:  

vehicles_by_age_of_vehicle= vehicles_data.Age_of_Vehicle.groupby([vehicles_data.Age_of_Vehicle]).count().sort_values()

trace_1 = go.Bar(
            x=vehicles_by_age_of_vehicle.index,
            y=vehicles_by_age_of_vehicle, 
            marker=dict(color = bupu[6]))

layout = go.Layout(
    #width=500,
    #height=700,
    title='Vehicle Age',
    xaxis = dict(ticks='', nticks=36, automargin=True),
    yaxis = dict(ticks='', range=[0, 250000]), 
    font=dict(size=10)
)


plotly.offline.iplot({
    "data": [trace_1],
    "layout": layout
})

In [83]:
# vehicles_by_Vehicle_Manoeuvre:  

vehicles_by_vehicle_manoeuvre = vehicles_data.Vehicle_Manoeuvre.groupby([vehicles_data.Vehicle_Manoeuvre]).count().sort_values()


trace_1 = go.Bar(
            x=vehicles_by_vehicle_manoeuvre.index,
            y=vehicles_by_vehicle_manoeuvre,
            marker=dict(color = bupu[1:]))

layout = go.Layout(
    title='Vehicle Manoeuvre',
    xaxis = dict(ticks='', nticks=36, automargin=True),
    yaxis = dict(ticks='', automargin=True), 
    font=dict(size=10)
)


plotly.offline.iplot({
    "data": [trace_1],
    "layout": layout
})

From above obervations we can get some useful insights: 
1. Cars, and Cyclists come as top vehicles invloved in car accidents
2. Electric cars, Enviroment friendly cars, were the lowest among other type of cars as compared to Patrol fueled cars, which probably becuase of their new emerging.
3. New and relatively new Cars, scored the highest when looking at age of vehicles, this is in important factor to consider for insurance companies and policy makers. 
4. Vehicle Manoeuvre, where Cars are (going ahead other) was the highest condition, from this information policy makers may establish new panelity to limit the posability of getting into an accidents.  

##### 6- What are the characteristics of casualties impacted in the accidents?
In the following graphs we will explore more features related to casualies impacted by accidents: 

In [85]:
# casualties_data_by_Age_of_Casualty:  

trace_1 = go.Histogram(
            x=casualties_data[casualties_data.Sex_of_Casualty == 'Male'].Age_of_Casualty, 
            name = 'Male',
            legendgroup= 'group1', 
            marker=dict(color = bupu[3]))

trace_2 = go.Histogram(
            x=casualties_data[casualties_data.Sex_of_Casualty == 'Female'].Age_of_Casualty, 
            name = 'Female', 
            legendgroup= 'group1', 
            marker=dict(color = bupu[5]))

trace_3 = go.Scatter(
            x=casualties_data[casualties_data.Casualty_Severity == 'Slight'].Age_of_Casualty.groupby([casualties_data.Age_of_Casualty]).count().index, 
            y = casualties_data[casualties_data.Casualty_Severity == 'Slight'].Age_of_Casualty.groupby([casualties_data.Age_of_Casualty]).count(),
            mode = 'lines',
            name = 'Slight',
            legendgroup= 'group2', 
            marker=dict(color = bupu[7]))

trace_4 = go.Scatter(
            x=casualties_data[casualties_data.Casualty_Severity == 'Serious'].Age_of_Casualty.groupby([casualties_data.Age_of_Casualty]).count().index, 
            y = casualties_data[casualties_data.Casualty_Severity == 'Serious'].Age_of_Casualty.groupby([casualties_data.Age_of_Casualty]).count(),
            mode = 'lines',
            name = 'Serious', 
            legendgroup= 'group2', 
            marker=dict(color = bupu[2]))

trace_5 = go.Scatter(
            x=casualties_data[casualties_data.Casualty_Severity == 'Fatal'].Age_of_Casualty.groupby([casualties_data.Age_of_Casualty]).count().index, 
            y = casualties_data[casualties_data.Casualty_Severity == 'Fatal'].Age_of_Casualty.groupby([casualties_data.Age_of_Casualty]).count(),
            mode = 'lines',
            legendgroup= 'group2', 
            name = 'Fatal', 
            marker=dict(color = bupu[8]))


layout = go.Layout(title="Age of Casualties", 
                        barmode='overlay', 
                        xaxis = dict(ticks='', nticks=48, 
                                    title=go.layout.xaxis.Title(
                                    text='Age'), automargin=True),
                        yaxis = dict(title=go.layout.yaxis.Title(
                                    text='Count'), automargin=True))

fig = go.Figure(data=[trace_1, trace_2,trace_3, trace_4, trace_5], layout=layout)
py.iplot(fig, filename='age of casualties')


Woah there! Look at all those points! Due to browser limitations, the Plotly SVG drawing functions have a hard time graphing more than 500k data points for line charts, or 40k points for other types of charts. Here are some suggestions:
(1) Use the `plotly.graph_objs.Scattergl` trace object to generate a WebGl graph.
(2) Trying using the image API to return an image instead of a graph URL
(3) Use matplotlib
(4) See if you can create your visualization with fewer data points



Woah there! Look at all those points! Due to browser limitations, the Plotly SVG drawing functions have a hard time graphing more than 500k data points for line charts, or 40k points for other types of charts. Here are some suggestions:
(1) Use the `plotly.graph_objs.Scattergl` trace object to generate a WebGl graph.
(2) Trying using the image API to return an image instead of a graph URL
(3) Use matplotlib
(4) See if you can create your visualization with fewer data points




In [45]:
# casualties_by_casualty_class:  

casualties_by_casualty_class= casualties_data.Casualty_Class.groupby([casualties_data.Casualty_Class]).count().sort_values()


trace_1 = go.Pie(
            labels=casualties_by_casualty_class.index,
            values=casualties_by_casualty_class, 
            marker=dict(colors = bupu[3:]))

layout = go.Layout(title="Class of Casualties", 
                        barmode='overlay', 
                        xaxis = dict(ticks='', nticks=48, 
                                    title=go.layout.xaxis.Title(
                                    text='Class')),
                        yaxis = dict(title=go.layout.yaxis.Title(
                                    text='Count')))

fig = go.Figure(data=[trace_1], layout=layout)
py.iplot(fig, filename='class of casualties')

When looking at casualties there are many different features and data we can capture. The first graph shows the distribution of age, gender in respect with severity. We can clearly see in increase in young adults, and college students. From the pie chart we can see who's most impacted class of casualties.

In [44]:
# casualties_by_casualty_type:  

casualties_by_casualty_type = casualties_data.Casualty_Type.groupby([casualties_data.Casualty_Type]).count().sort_values()

trace_1 = go.Bar(
            x=casualties_by_casualty_type.index,
            y=casualties_by_casualty_type, 
            name = 'Casualty Type', 
            marker=dict(color = bupu[4]))

layout = go.Layout(title="Type of Casualties", 
                        barmode='stack', 
                        xaxis = dict(ticks='', nticks=48, 
                                    title=go.layout.xaxis.Title(
                                    text='Type'), automargin=True),
                        yaxis = dict(ticks='', title=go.layout.yaxis.Title(
                                    text='Count'), automargin=True))

plotly.offline.iplot({
    "data": [trace_1],
    "layout": layout
})

In [86]:
# casualties_by_pedestrian_location:  

casualties_by_pedestrian_location= casualties_data[casualties_data.Casualty_Type == 'Pedestrian'].Pedestrian_Location.groupby([casualties_data.Pedestrian_Location]).count().sort_values()

trace_1 = go.Bar(
            x=casualties_by_pedestrian_location.index,
            y=casualties_by_pedestrian_location, 
            name = 'Pedestrian Location', 
            marker=dict(color = bupu[4]))

layout = go.Layout(title="Casualty: Pedestrian Location", 
                        barmode='stack', 
                        xaxis = dict(ticks='', nticks=48, 
                                    title=go.layout.xaxis.Title(
                                    text='Type'), automargin=True),
                        yaxis = dict(ticks='', title=go.layout.yaxis.Title(
                                    text='Count'), automargin=True))

plotly.offline.iplot({
    "data": [trace_1],
    "layout": layout
})

In [87]:
# casualties_by_pedestrian_movement:  

casualties_by_pedestrian_movement= casualties_data[casualties_data.Casualty_Type == 'Pedestrian'].Pedestrian_Movement.groupby([casualties_data.Pedestrian_Movement]).count().sort_values()


trace_1 = go.Bar(
            x=casualties_by_pedestrian_movement.index,
            y=casualties_by_pedestrian_movement, 
            name = 'Pedestrian Movement', 
            marker=dict(color = bupu[4]))

layout = go.Layout(title="Casualty: Pedestrian Movement", 
                        barmode='stack', 
                        xaxis = dict(ticks='', nticks=48, 
                                    title=go.layout.xaxis.Title(
                                    text='Type'), automargin=True),
                        yaxis = dict(ticks='', title=go.layout.yaxis.Title(
                                    text='Count'), automargin=True))


plotly.offline.iplot({
    "data": [trace_1],
    "layout": layout
})

Above graphs were specific to Pedestrian casualties, we can see the pedestrian location, and movement. These insights are very useful when developing roads safety, for example to determine the location of pedstrian crossing, increasing junction controls, placing surveillance to closly moniter congested crossings.   

In the exploratory  analysis we aimed to view the data from different perspectives staring with accidents data, where we did time series analysis, as well as geo-location analysis for small subset of the data. In addition to drivers and casualties analysis.

## Building Predictive Model:

This is where we can use data science magic to tell us something we didn't know about our data, here we will answer the main quesiton:

##### 7- What are the main factors causing an accidents, and can we predict the severity based on these factors?

In this part we will do further cleaning and preprocessing so we can use it on ML algorithm, we will use Supervided Learning algorithems to identify the severity of an accident, since the dataset is unbalanced we will focuse on getting good F1 result rather then accuracy, this will help in identifying whether or not an accident is likely to happen, to dispatch necessary actions to manage and reslove the accident.


In [90]:
# assesment of missing data: 
accidents_data.isnull().sum(axis = 0)

Accident_Index                                 0
Location_Easting_OSGR                          0
Location_Northing_OSGR                         0
Longitude                                      0
Latitude                                       0
Police_Force                                   0
Accident_Severity                              0
Number_of_Vehicles                             0
Number_of_Casualties                           0
Date                                           0
Day_of_Week                                    0
Time                                           0
Local_Authority_(District)                     0
Local_Authority_(Highway)                      0
1st_Road_Class                                 0
1st_Road_Number                                0
Road_Type                                      0
Speed_limit                                    0
Junction_Detail                                0
2nd_Road_Number                                0
Pedestrian_Crossing-

In [47]:
# split data to fit in suprvised ML algorithm 
data = accidents_data.copy()
severity_raw = data['Accident_Severity']
features_raw = data.drop('Accident_Severity', axis = 1)

In [48]:
# map severity to code: 
severity = severity_raw.map({'Slight':0, 'Serious':1, 'Fatal':2})

In [51]:
# identify categorical feature to get dummay variables: 
catigorical = accidents_data_catigorical
catigorical.remove('Accident_Severity')

In [52]:
features_raw.columns

Index(['Accident_Index', 'Location_Easting_OSGR', 'Location_Northing_OSGR',
       'Longitude', 'Latitude', 'Police_Force', 'Number_of_Vehicles',
       'Number_of_Casualties', 'Date', 'Day_of_Week', 'Time',
       'Local_Authority_(District)', 'Local_Authority_(Highway)',
       '1st_Road_Class', '1st_Road_Number', 'Road_Type', 'Speed_limit',
       'Junction_Detail', '2nd_Road_Number',
       'Pedestrian_Crossing-Human_Control',
       'Pedestrian_Crossing-Physical_Facilities', 'Light_Conditions',
       'Weather_Conditions', 'Road_Surface_Conditions',
       'Special_Conditions_at_Site', 'Carriageway_Hazards',
       'Urban_or_Rural_Area', 'Did_Police_Officer_Attend_Scene_of_Accident',
       'Year', 'Month_number', 'Month', 'Hour', 'Vehicle_Type',
       'Sex_of_Driver', 'Age_of_Driver', 'Age_Band_of_Driver',
       'Vehicle_Manoeuvre'],
      dtype='object')

In [53]:
features_final = pd.get_dummies(features_raw, columns = catigorical, drop_first=True)

# Print the number of features after one-hot encoding
encoded = list(features_final.columns)
print("{} total features after one-hot encoding.".format(len(encoded)))

834 total features after one-hot encoding.


In [54]:
features_final.columns

Index(['Accident_Index', 'Location_Easting_OSGR', 'Location_Northing_OSGR',
       'Longitude', 'Latitude', 'Number_of_Vehicles', 'Number_of_Casualties',
       'Date', 'Time', '1st_Road_Number',
       ...
       'Hour_14', 'Hour_15', 'Hour_16', 'Hour_17', 'Hour_18', 'Hour_19',
       'Hour_20', 'Hour_21', 'Hour_22', 'Hour_23'],
      dtype='object', length=834)

In [55]:
features_final = features_final.drop(['Accident_Index', 'Did_Police_Officer_Attend_Scene_of_Accident','Location_Easting_OSGR','Location_Northing_OSGR' ,'Date', 'Time', 'Age_of_Driver','Longitude', 'Latitude', '2nd_Road_Number', '1st_Road_Number','Number_of_Casualties', 'Number_of_Vehicles', 'Month_number', 'Year'], axis = 1)

# Print the number of features after dropping columns
encoded = list(features_final.columns)
print("{} total features after dropping columns.".format(len(encoded)))

819 total features after dropping columns.


In [30]:
features_final = features_final.astype(np.float64)

In [31]:
features_final.head()

Unnamed: 0,Speed_limit_10.0,Speed_limit_15.0,Speed_limit_20.0,Speed_limit_30.0,Speed_limit_40.0,Speed_limit_50.0,Speed_limit_60.0,Speed_limit_70.0,Junction_Detail_Mini-roundabout,Junction_Detail_More than 4 arms (not roundabout),Junction_Detail_Not at junction or within 20 metres,Junction_Detail_Other junction,Junction_Detail_Private drive or entrance,Junction_Detail_Roundabout,Junction_Detail_Slip road,Junction_Detail_T or staggered junction,Light_Conditions_Darkness - lights lit,Light_Conditions_Darkness - lights unlit,Light_Conditions_Darkness - no lighting,Light_Conditions_Daylight,Road_Type_One way street,Road_Type_Roundabout,Road_Type_Single carriageway,Road_Type_Slip road,Road_Type_Unknown,Weather_Conditions_Fine no high winds,Weather_Conditions_Fog or mist,Weather_Conditions_Other,Weather_Conditions_Raining + high winds,Weather_Conditions_Raining no high winds,Weather_Conditions_Snowing + high winds,Weather_Conditions_Snowing no high winds,Weather_Conditions_Unknown,Road_Surface_Conditions_Flood over 3cm. deep,Road_Surface_Conditions_Frost or ice,Road_Surface_Conditions_Snow,Road_Surface_Conditions_Wet or damp,Special_Conditions_at_Site_Auto traffic signal - out,Special_Conditions_at_Site_Mud,Special_Conditions_at_Site_None,Special_Conditions_at_Site_Oil or diesel,Special_Conditions_at_Site_Road sign or marking defective or obscured,Special_Conditions_at_Site_Road surface defective,Special_Conditions_at_Site_Roadworks,Carriageway_Hazards_None,Carriageway_Hazards_Other object on road,Carriageway_Hazards_Pedestrian in carriageway - not injured,Carriageway_Hazards_Previous accident,Carriageway_Hazards_Vehicle load on road,Urban_or_Rural_Area_Unallocated,Urban_or_Rural_Area_Urban,1st_Road_Class_A(M),1st_Road_Class_B,1st_Road_Class_C,1st_Road_Class_Motorway,1st_Road_Class_Unclassified,Pedestrian_Crossing-Human_Control_Control by school crossing patrol,Pedestrian_Crossing-Human_Control_None within 50 metres,Pedestrian_Crossing-Physical_Facilities_Footbridge or subway,Pedestrian_Crossing-Physical_Facilities_No physical crossing facilities within 50 metres,Pedestrian_Crossing-Physical_Facilities_Pedestrian phase at traffic signal junction,"Pedestrian_Crossing-Physical_Facilities_Pelican, puffin, toucan or similar non-junction pedestrian light crossing",Pedestrian_Crossing-Physical_Facilities_Zebra,Police_Force_Bedfordshire,Police_Force_Cambridgeshire,Police_Force_Central,Police_Force_Cheshire,Police_Force_City of London,Police_Force_Cleveland,Police_Force_Cumbria,Police_Force_Derbyshire,Police_Force_Devon and Cornwall,Police_Force_Dorset,Police_Force_Dumfries and Galloway,Police_Force_Durham,Police_Force_Dyfed-Powys,Police_Force_Essex,Police_Force_Fife,Police_Force_Gloucestershire,Police_Force_Grampian,Police_Force_Greater Manchester,Police_Force_Gwent,Police_Force_Hampshire,Police_Force_Hertfordshire,Police_Force_Humberside,Police_Force_Kent,Police_Force_Lancashire,Police_Force_Leicestershire,Police_Force_Lincolnshire,Police_Force_Lothian and Borders,Police_Force_Merseyside,Police_Force_Metropolitan Police,Police_Force_Norfolk,Police_Force_North Wales,Police_Force_North Yorkshire,Police_Force_Northamptonshire,Police_Force_Northern,Police_Force_Northumbria,Police_Force_Nottinghamshire,Police_Force_South Wales,Police_Force_South Yorkshire,Police_Force_Staffordshire,Police_Force_Strathclyde,Police_Force_Suffolk,Police_Force_Surrey,Police_Force_Sussex,Police_Force_Tayside,Police_Force_Thames Valley,Police_Force_Warwickshire,Police_Force_West Mercia,Police_Force_West Midlands,Police_Force_West Yorkshire,Police_Force_Wiltshire,Vehicle_Type_Bus or coach (17 or more pass seats),Vehicle_Type_Car,Vehicle_Type_Electric motorcycle,Vehicle_Type_Goods 7.5 tonnes mgw and over,Vehicle_Type_Goods over 3.5t. and under 7.5t,Vehicle_Type_Goods vehicle - unknown weight,Vehicle_Type_Minibus (8 - 16 passenger seats),Vehicle_Type_Mobility scooter,Vehicle_Type_Motorcycle - unknown cc,Vehicle_Type_Motorcycle 125cc and under,Vehicle_Type_Motorcycle 50cc and under,Vehicle_Type_Motorcycle over 125cc and up to 500cc,Vehicle_Type_Motorcycle over 500cc,Vehicle_Type_Other vehicle,Vehicle_Type_Pedal cycle,Vehicle_Type_Ridden horse,Vehicle_Type_Taxi/Private hire car,Vehicle_Type_Tram,Vehicle_Type_Van / Goods 3.5 tonnes mgw or under,Sex_of_Driver_Male,Vehicle_Manoeuvre_Changing lane to right,Vehicle_Manoeuvre_Going ahead left-hand bend,Vehicle_Manoeuvre_Going ahead other,Vehicle_Manoeuvre_Going ahead right-hand bend,Vehicle_Manoeuvre_Moving off,Vehicle_Manoeuvre_Overtaking - nearside,Vehicle_Manoeuvre_Overtaking moving vehicle - offside,Vehicle_Manoeuvre_Overtaking static vehicle - offside,Vehicle_Manoeuvre_Parked,Vehicle_Manoeuvre_Reversing,Vehicle_Manoeuvre_Slowing or stopping,Vehicle_Manoeuvre_Turning left,Vehicle_Manoeuvre_Turning right,Vehicle_Manoeuvre_U-turn,Vehicle_Manoeuvre_Waiting to go - held up,Vehicle_Manoeuvre_Waiting to turn left,Vehicle_Manoeuvre_Waiting to turn right,Age_Band_of_Driver_16 - 20,Age_Band_of_Driver_21 - 25,Age_Band_of_Driver_26 - 35,Age_Band_of_Driver_36 - 45,Age_Band_of_Driver_46 - 55,Age_Band_of_Driver_56 - 65,Age_Band_of_Driver_66 - 75,Age_Band_of_Driver_Over 75,Month_August,Month_December,Month_February,Month_January,Month_July,Month_June,Month_March,Month_May,Month_November,Month_October,Month_September,Local_Authority_(Highway)_Aberdeenshire,Local_Authority_(Highway)_Angus,Local_Authority_(Highway)_Argyll & Bute,Local_Authority_(Highway)_Barking and Dagenham,Local_Authority_(Highway)_Barnet,Local_Authority_(Highway)_Barnsley,Local_Authority_(Highway)_Bath and North East Somerset,Local_Authority_(Highway)_Bedford,Local_Authority_(Highway)_Bexley,Local_Authority_(Highway)_Birmingham,Local_Authority_(Highway)_Blackburn with Darwen,Local_Authority_(Highway)_Blackpool,Local_Authority_(Highway)_Blaenau Gwent,Local_Authority_(Highway)_Bolton,Local_Authority_(Highway)_Bournemouth,Local_Authority_(Highway)_Bracknell Forest,Local_Authority_(Highway)_Bradford,Local_Authority_(Highway)_Brent,Local_Authority_(Highway)_Bridgend,Local_Authority_(Highway)_Brighton and Hove,"Local_Authority_(Highway)_Bristol, City of",Local_Authority_(Highway)_Bromley,Local_Authority_(Highway)_Buckinghamshire,Local_Authority_(Highway)_Bury,Local_Authority_(Highway)_Caerphilly,Local_Authority_(Highway)_Calderdale,Local_Authority_(Highway)_Cambridgeshire,Local_Authority_(Highway)_Camden,Local_Authority_(Highway)_Cardiff,Local_Authority_(Highway)_Carmarthenshire,Local_Authority_(Highway)_Central Bedfordshire,Local_Authority_(Highway)_Ceredigion,Local_Authority_(Highway)_Cheshire East,Local_Authority_(Highway)_Cheshire West and Chester,Local_Authority_(Highway)_City of London,Local_Authority_(Highway)_Clackmannanshire,Local_Authority_(Highway)_Conwy,Local_Authority_(Highway)_Cornwall,Local_Authority_(Highway)_County Durham,Local_Authority_(Highway)_Coventry,Local_Authority_(Highway)_Croydon,Local_Authority_(Highway)_Cumbria,Local_Authority_(Highway)_Darlington,Local_Authority_(Highway)_Denbighshire,Local_Authority_(Highway)_Derby,Local_Authority_(Highway)_Derbyshire,Local_Authority_(Highway)_Devon,Local_Authority_(Highway)_Doncaster,Local_Authority_(Highway)_Dorset,Local_Authority_(Highway)_Dudley,Local_Authority_(Highway)_Dumfries & Galloway,Local_Authority_(Highway)_Dundee City,Local_Authority_(Highway)_Ealing,Local_Authority_(Highway)_East Ayrshire,Local_Authority_(Highway)_East Dunbartonshire,Local_Authority_(Highway)_East Lothian,Local_Authority_(Highway)_East Renfrewshire,Local_Authority_(Highway)_East Riding of Yorkshire,Local_Authority_(Highway)_East Sussex,"Local_Authority_(Highway)_Edinburgh, City of",Local_Authority_(Highway)_Enfield,Local_Authority_(Highway)_Essex,Local_Authority_(Highway)_Falkirk,Local_Authority_(Highway)_Fife,Local_Authority_(Highway)_Flintshire,Local_Authority_(Highway)_Gateshead,Local_Authority_(Highway)_Glasgow City,Local_Authority_(Highway)_Gloucestershire,Local_Authority_(Highway)_Greenwich,Local_Authority_(Highway)_Gwynedd,Local_Authority_(Highway)_Hackney,Local_Authority_(Highway)_Halton,Local_Authority_(Highway)_Hammersmith and Fulham,Local_Authority_(Highway)_Hampshire,Local_Authority_(Highway)_Haringey,Local_Authority_(Highway)_Harrow,Local_Authority_(Highway)_Hartlepool,Local_Authority_(Highway)_Havering,"Local_Authority_(Highway)_Herefordshire, County of",Local_Authority_(Highway)_Hertfordshire,Local_Authority_(Highway)_Highland,Local_Authority_(Highway)_Hillingdon,Local_Authority_(Highway)_Hounslow,Local_Authority_(Highway)_Inverclyde,Local_Authority_(Highway)_Isle of Anglesey,Local_Authority_(Highway)_Isle of Wight,Local_Authority_(Highway)_Isles of Scilly,Local_Authority_(Highway)_Islington,Local_Authority_(Highway)_Kensington and Chelsea,Local_Authority_(Highway)_Kent,"Local_Authority_(Highway)_Kingston upon Hull, City of",Local_Authority_(Highway)_Kingston upon Thames,Local_Authority_(Highway)_Kirklees,Local_Authority_(Highway)_Knowsley,Local_Authority_(Highway)_Lambeth,Local_Authority_(Highway)_Lancashire,Local_Authority_(Highway)_Leeds,Local_Authority_(Highway)_Leicester,Local_Authority_(Highway)_Leicestershire,Local_Authority_(Highway)_Lewisham,Local_Authority_(Highway)_Lincolnshire,Local_Authority_(Highway)_Liverpool,Local_Authority_(Highway)_London Airport (Heathrow),Local_Authority_(Highway)_Luton,Local_Authority_(Highway)_Manchester,Local_Authority_(Highway)_Medway,Local_Authority_(Highway)_Merthyr Tydfil,Local_Authority_(Highway)_Merton,Local_Authority_(Highway)_Middlesbrough,Local_Authority_(Highway)_Midlothian,Local_Authority_(Highway)_Milton Keynes,Local_Authority_(Highway)_Monmouthshire,Local_Authority_(Highway)_Moray,Local_Authority_(Highway)_Na h-Eileanan an Iar (Western Isles),Local_Authority_(Highway)_Neath Port Talbot,Local_Authority_(Highway)_Newcastle upon Tyne,Local_Authority_(Highway)_Newham,Local_Authority_(Highway)_Newport,Local_Authority_(Highway)_Norfolk,Local_Authority_(Highway)_North Ayrshire,Local_Authority_(Highway)_North East Lincolnshire,Local_Authority_(Highway)_North Lanarkshire,Local_Authority_(Highway)_North Lincolnshire,Local_Authority_(Highway)_North Somerset,Local_Authority_(Highway)_North Tyneside,Local_Authority_(Highway)_North Yorkshire,Local_Authority_(Highway)_Northamptonshire,Local_Authority_(Highway)_Northumberland,Local_Authority_(Highway)_Nottingham,Local_Authority_(Highway)_Nottinghamshire,Local_Authority_(Highway)_Oldham,Local_Authority_(Highway)_Orkney Islands,Local_Authority_(Highway)_Oxfordshire,Local_Authority_(Highway)_Pembrokeshire,Local_Authority_(Highway)_Perth and Kinross,Local_Authority_(Highway)_Peterborough,Local_Authority_(Highway)_Plymouth,Local_Authority_(Highway)_Poole,Local_Authority_(Highway)_Portsmouth,Local_Authority_(Highway)_Powys,Local_Authority_(Highway)_Reading,Local_Authority_(Highway)_Redbridge,Local_Authority_(Highway)_Redcar and Cleveland,Local_Authority_(Highway)_Renfrewshire,"Local_Authority_(Highway)_Rhondda, Cynon, Taff",Local_Authority_(Highway)_Richmond upon Thames,Local_Authority_(Highway)_Rochdale,Local_Authority_(Highway)_Rotherham,Local_Authority_(Highway)_Rutland,Local_Authority_(Highway)_Salford,Local_Authority_(Highway)_Sandwell,Local_Authority_(Highway)_Scottish Borders,Local_Authority_(Highway)_Sefton,Local_Authority_(Highway)_Sheffield,Local_Authority_(Highway)_Shetland Islands,Local_Authority_(Highway)_Shropshire,Local_Authority_(Highway)_Slough,Local_Authority_(Highway)_Solihull,Local_Authority_(Highway)_Somerset,Local_Authority_(Highway)_South Ayrshire,Local_Authority_(Highway)_South Gloucestershire,Local_Authority_(Highway)_South Lanarkshire,Local_Authority_(Highway)_South Tyneside,Local_Authority_(Highway)_Southampton,Local_Authority_(Highway)_Southend-on-Sea,Local_Authority_(Highway)_Southwark,Local_Authority_(Highway)_St. Helens,Local_Authority_(Highway)_Staffordshire,Local_Authority_(Highway)_Stirling,Local_Authority_(Highway)_Stockport,Local_Authority_(Highway)_Stockton-on-Tees,Local_Authority_(Highway)_Stoke-on-Trent,Local_Authority_(Highway)_Suffolk,Local_Authority_(Highway)_Sunderland,Local_Authority_(Highway)_Surrey,Local_Authority_(Highway)_Sutton,Local_Authority_(Highway)_Swansea,Local_Authority_(Highway)_Swindon,Local_Authority_(Highway)_Tameside,Local_Authority_(Highway)_Telford and Wrekin,Local_Authority_(Highway)_The Vale of Glamorgan,Local_Authority_(Highway)_Thurrock,Local_Authority_(Highway)_Torbay,Local_Authority_(Highway)_Torfaen,Local_Authority_(Highway)_Tower Hamlets,Local_Authority_(Highway)_Trafford,Local_Authority_(Highway)_Wakefield,Local_Authority_(Highway)_Walsall,Local_Authority_(Highway)_Waltham Forest,Local_Authority_(Highway)_Wandsworth,Local_Authority_(Highway)_Warrington,Local_Authority_(Highway)_Warwickshire,Local_Authority_(Highway)_West Berkshire,Local_Authority_(Highway)_West Dunbartonshire,Local_Authority_(Highway)_West Lothian,Local_Authority_(Highway)_West Sussex,Local_Authority_(Highway)_Westminster,Local_Authority_(Highway)_Wigan,Local_Authority_(Highway)_Wiltshire,Local_Authority_(Highway)_Windsor and Maidenhead,Local_Authority_(Highway)_Wirral,Local_Authority_(Highway)_Wokingham,Local_Authority_(Highway)_Wolverhampton,Local_Authority_(Highway)_Worcestershire,Local_Authority_(Highway)_Wrexham,Local_Authority_(Highway)_York,Local_Authority_(District)_Aberdeenshire,Local_Authority_(District)_Adur,Local_Authority_(District)_Allerdale,Local_Authority_(District)_Alnwick,Local_Authority_(District)_Amber Valley,Local_Authority_(District)_Angus,Local_Authority_(District)_Argyll and Bute,Local_Authority_(District)_Arun,Local_Authority_(District)_Ashfield,Local_Authority_(District)_Ashford,Local_Authority_(District)_Aylesbury Vale,Local_Authority_(District)_Babergh,Local_Authority_(District)_Barking and Dagenham,Local_Authority_(District)_Barnet,Local_Authority_(District)_Barnsley,Local_Authority_(District)_Barrow-in-Furness,Local_Authority_(District)_Basildon,Local_Authority_(District)_Basingstoke and Deane,Local_Authority_(District)_Bassetlaw,Local_Authority_(District)_Bath and North East Somerset,Local_Authority_(District)_Bedford,Local_Authority_(District)_Berwick-upon-Tweed,Local_Authority_(District)_Bexley,Local_Authority_(District)_Birmingham,Local_Authority_(District)_Blaby,Local_Authority_(District)_Blackburn with Darwen,Local_Authority_(District)_Blackpool,Local_Authority_(District)_Blaenau Gwent,Local_Authority_(District)_Blyth Valley,Local_Authority_(District)_Bolsover,Local_Authority_(District)_Bolton,Local_Authority_(District)_Boston,Local_Authority_(District)_Bournemouth,Local_Authority_(District)_Bracknell Forest,Local_Authority_(District)_Bradford,Local_Authority_(District)_Braintree,Local_Authority_(District)_Breckland,Local_Authority_(District)_Brent,Local_Authority_(District)_Brentwood,Local_Authority_(District)_Bridgend,Local_Authority_(District)_Bridgnorth,Local_Authority_(District)_Brighton and Hove,"Local_Authority_(District)_Bristol, City of",Local_Authority_(District)_Broadland,Local_Authority_(District)_Bromley,Local_Authority_(District)_Bromsgrove,Local_Authority_(District)_Broxbourne,Local_Authority_(District)_Broxtowe,Local_Authority_(District)_Burnley,Local_Authority_(District)_Bury,Local_Authority_(District)_Caerphilly,Local_Authority_(District)_Calderdale,Local_Authority_(District)_Cambridge,Local_Authority_(District)_Camden,Local_Authority_(District)_Cannock Chase,Local_Authority_(District)_Canterbury,Local_Authority_(District)_Caradon,Local_Authority_(District)_Cardiff,Local_Authority_(District)_Carlisle,Local_Authority_(District)_Carmarthenshire,Local_Authority_(District)_Carrick,Local_Authority_(District)_Castle Morpeth,Local_Authority_(District)_Castle Point,Local_Authority_(District)_Central Bedfordshire,Local_Authority_(District)_Ceredigion,Local_Authority_(District)_Charnwood,Local_Authority_(District)_Chelmsford,Local_Authority_(District)_Cheltenham,Local_Authority_(District)_Cherwell,Local_Authority_(District)_Cheshire East,Local_Authority_(District)_Cheshire West and Chester,Local_Authority_(District)_Chester,Local_Authority_(District)_Chester-le-Street,Local_Authority_(District)_Chesterfield,Local_Authority_(District)_Chichester,Local_Authority_(District)_Chiltern,Local_Authority_(District)_Chorley,Local_Authority_(District)_Christchurch,Local_Authority_(District)_City of London,Local_Authority_(District)_Clackmannanshire,Local_Authority_(District)_Colchester,Local_Authority_(District)_Congleton,Local_Authority_(District)_Conwy,Local_Authority_(District)_Copeland,Local_Authority_(District)_Corby,Local_Authority_(District)_Cornwall,Local_Authority_(District)_Cotswold,Local_Authority_(District)_County Durham,Local_Authority_(District)_Coventry,Local_Authority_(District)_Craven,Local_Authority_(District)_Crawley,Local_Authority_(District)_Crewe and Nantwich,Local_Authority_(District)_Croydon,Local_Authority_(District)_Dacorum,Local_Authority_(District)_Darlington,Local_Authority_(District)_Dartford,Local_Authority_(District)_Daventry,Local_Authority_(District)_Denbighshire,Local_Authority_(District)_Derby,Local_Authority_(District)_Derbyshire Dales,Local_Authority_(District)_Derwentside,Local_Authority_(District)_Doncaster,Local_Authority_(District)_Dover,Local_Authority_(District)_Dudley,Local_Authority_(District)_Dumfries and Galloway,Local_Authority_(District)_Dundee City,Local_Authority_(District)_Durham,Local_Authority_(District)_Ealing,Local_Authority_(District)_Easington,Local_Authority_(District)_East Ayrshire,Local_Authority_(District)_East Cambridgeshire,Local_Authority_(District)_East Devon,Local_Authority_(District)_East Dorset,Local_Authority_(District)_East Dunbartonshire,Local_Authority_(District)_East Hampshire,Local_Authority_(District)_East Hertfordshire,Local_Authority_(District)_East Lindsey,Local_Authority_(District)_East Lothian,Local_Authority_(District)_East Northamptonshire,Local_Authority_(District)_East Renfrewshire,Local_Authority_(District)_East Riding of Yorkshire,Local_Authority_(District)_East Staffordshire,Local_Authority_(District)_Eastbourne,Local_Authority_(District)_Eastleigh,Local_Authority_(District)_Eden,"Local_Authority_(District)_Edinburgh, City of",Local_Authority_(District)_Ellesmere Port and Neston,Local_Authority_(District)_Elmbridge,Local_Authority_(District)_Enfield,Local_Authority_(District)_Epping Forest,Local_Authority_(District)_Epsom and Ewell,Local_Authority_(District)_Erewash,Local_Authority_(District)_Exeter,Local_Authority_(District)_Falkirk,Local_Authority_(District)_Fareham,Local_Authority_(District)_Fenland,Local_Authority_(District)_Fife,Local_Authority_(District)_Flintshire,Local_Authority_(District)_Forest Heath,Local_Authority_(District)_Forest of Dean,Local_Authority_(District)_Fylde,Local_Authority_(District)_Gateshead,Local_Authority_(District)_Gedling,Local_Authority_(District)_Glasgow City,Local_Authority_(District)_Gloucester,Local_Authority_(District)_Gosport,Local_Authority_(District)_Gravesham,Local_Authority_(District)_Great Yarmouth,Local_Authority_(District)_Greenwich,Local_Authority_(District)_Guildford,Local_Authority_(District)_Gwynedd,Local_Authority_(District)_Hackney,Local_Authority_(District)_Halton,Local_Authority_(District)_Hambleton,Local_Authority_(District)_Hammersmith and Fulham,Local_Authority_(District)_Harborough,Local_Authority_(District)_Haringey,Local_Authority_(District)_Harlow,Local_Authority_(District)_Harrogate,Local_Authority_(District)_Harrow,Local_Authority_(District)_Hart,Local_Authority_(District)_Hartlepool,Local_Authority_(District)_Hastings,Local_Authority_(District)_Havant,Local_Authority_(District)_Havering,"Local_Authority_(District)_Herefordshire, County of",Local_Authority_(District)_Hertsmere,Local_Authority_(District)_High Peak,Local_Authority_(District)_Highland,Local_Authority_(District)_Hillingdon,Local_Authority_(District)_Hinckley and Bosworth,Local_Authority_(District)_Horsham,Local_Authority_(District)_Hounslow,Local_Authority_(District)_Huntingdonshire,Local_Authority_(District)_Hyndburn,Local_Authority_(District)_Inverclyde,Local_Authority_(District)_Ipswich,Local_Authority_(District)_Isle of Anglesey,Local_Authority_(District)_Isle of Wight,Local_Authority_(District)_Islington,Local_Authority_(District)_Kennet,Local_Authority_(District)_Kensington and Chelsea,Local_Authority_(District)_Kerrier,Local_Authority_(District)_Kettering,Local_Authority_(District)_King's Lynn and West Norfolk,"Local_Authority_(District)_Kingston upon Hull, City of",Local_Authority_(District)_Kingston upon Thames,Local_Authority_(District)_Kirklees,Local_Authority_(District)_Knowsley,Local_Authority_(District)_Lambeth,Local_Authority_(District)_Lancaster,Local_Authority_(District)_Leeds,Local_Authority_(District)_Leicester,Local_Authority_(District)_Lewes,Local_Authority_(District)_Lewisham,Local_Authority_(District)_Lichfield,Local_Authority_(District)_Lincoln,Local_Authority_(District)_Liverpool,Local_Authority_(District)_London Airport (Heathrow),Local_Authority_(District)_Luton,Local_Authority_(District)_Macclesfield,Local_Authority_(District)_Maidstone,Local_Authority_(District)_Maldon,Local_Authority_(District)_Malvern Hills,Local_Authority_(District)_Manchester,Local_Authority_(District)_Mansfield,Local_Authority_(District)_Medway,Local_Authority_(District)_Melton,Local_Authority_(District)_Mendip,Local_Authority_(District)_Merthyr Tydfil,Local_Authority_(District)_Merton,Local_Authority_(District)_Mid Bedfordshire,Local_Authority_(District)_Mid Devon,Local_Authority_(District)_Mid Suffolk,Local_Authority_(District)_Mid Sussex,Local_Authority_(District)_Middlesbrough,Local_Authority_(District)_Midlothian,Local_Authority_(District)_Milton Keynes,Local_Authority_(District)_Mole Valley,Local_Authority_(District)_Monmouthshire,Local_Authority_(District)_Moray,Local_Authority_(District)_Neath Port Talbot,Local_Authority_(District)_New Forest,Local_Authority_(District)_Newark and Sherwood,Local_Authority_(District)_Newcastle upon Tyne,Local_Authority_(District)_Newcastle-under-Lyme,Local_Authority_(District)_Newham,Local_Authority_(District)_Newport,Local_Authority_(District)_North Ayrshire,Local_Authority_(District)_North Cornwall,Local_Authority_(District)_North Devon,Local_Authority_(District)_North Dorset,Local_Authority_(District)_North East Derbyshire,Local_Authority_(District)_North East Lincolnshire,Local_Authority_(District)_North Hertfordshire,Local_Authority_(District)_North Kesteven,Local_Authority_(District)_North Lanarkshire,Local_Authority_(District)_North Lincolnshire,Local_Authority_(District)_North Norfolk,Local_Authority_(District)_North Shropshire,Local_Authority_(District)_North Somerset,Local_Authority_(District)_North Tyneside,Local_Authority_(District)_North Warwickshire,Local_Authority_(District)_North West Leicestershire,Local_Authority_(District)_North Wiltshire,Local_Authority_(District)_Northampton,Local_Authority_(District)_Northumberland,Local_Authority_(District)_Norwich,Local_Authority_(District)_Nottingham,Local_Authority_(District)_Nuneaton and Bedworth,Local_Authority_(District)_Oadby and Wigston,Local_Authority_(District)_Oldham,Local_Authority_(District)_Orkney Islands,Local_Authority_(District)_Oswestry,Local_Authority_(District)_Oxford,Local_Authority_(District)_Pembrokeshire,Local_Authority_(District)_Pendle,Local_Authority_(District)_Penwith,Local_Authority_(District)_Perth and Kinross,Local_Authority_(District)_Peterborough,Local_Authority_(District)_Plymouth,Local_Authority_(District)_Poole,Local_Authority_(District)_Portsmouth,Local_Authority_(District)_Powys,Local_Authority_(District)_Preston,Local_Authority_(District)_Purbeck,Local_Authority_(District)_Reading,Local_Authority_(District)_Redbridge,Local_Authority_(District)_Redcar and Cleveland,Local_Authority_(District)_Redditch,Local_Authority_(District)_Reigate and Banstead,Local_Authority_(District)_Renfrewshire,Local_Authority_(District)_Restormel,"Local_Authority_(District)_Rhondda, Cynon, Taff",Local_Authority_(District)_Ribble Valley,Local_Authority_(District)_Richmond upon Thames,Local_Authority_(District)_Richmondshire,Local_Authority_(District)_Rochdale,Local_Authority_(District)_Rochford,Local_Authority_(District)_Rossendale,Local_Authority_(District)_Rother,Local_Authority_(District)_Rotherham,Local_Authority_(District)_Rugby,Local_Authority_(District)_Runnymede,Local_Authority_(District)_Rushcliffe,Local_Authority_(District)_Rushmoor,Local_Authority_(District)_Rutland,Local_Authority_(District)_Ryedale,Local_Authority_(District)_Salford,Local_Authority_(District)_Salisbury,Local_Authority_(District)_Sandwell,Local_Authority_(District)_Scarborough,Local_Authority_(District)_Scottish Borders,Local_Authority_(District)_Sedgefield,Local_Authority_(District)_Sedgemoor,Local_Authority_(District)_Sefton,Local_Authority_(District)_Selby,Local_Authority_(District)_Sevenoaks,Local_Authority_(District)_Sheffield,Local_Authority_(District)_Shepway,Local_Authority_(District)_Shetland Islands,Local_Authority_(District)_Shrewsbury and Atcham,Local_Authority_(District)_Shropshire,Local_Authority_(District)_Slough,Local_Authority_(District)_Solihull,Local_Authority_(District)_South Ayrshire,Local_Authority_(District)_South Bedfordshire,Local_Authority_(District)_South Bucks,Local_Authority_(District)_South Cambridgeshire,Local_Authority_(District)_South Derbyshire,Local_Authority_(District)_South Gloucestershire,Local_Authority_(District)_South Hams,Local_Authority_(District)_South Holland,Local_Authority_(District)_South Kesteven,Local_Authority_(District)_South Lakeland,Local_Authority_(District)_South Lanarkshire,Local_Authority_(District)_South Norfolk,Local_Authority_(District)_South Northamptonshire,Local_Authority_(District)_South Oxfordshire,Local_Authority_(District)_South Ribble,Local_Authority_(District)_South Shropshire,Local_Authority_(District)_South Somerset,Local_Authority_(District)_South Staffordshire,Local_Authority_(District)_South Tyneside,Local_Authority_(District)_Southampton,Local_Authority_(District)_Southend-on-Sea,Local_Authority_(District)_Southwark,Local_Authority_(District)_Spelthorne,Local_Authority_(District)_St. Albans,Local_Authority_(District)_St. Edmundsbury,Local_Authority_(District)_St. Helens,Local_Authority_(District)_Stafford,Local_Authority_(District)_Staffordshire Moorlands,Local_Authority_(District)_Stevenage,Local_Authority_(District)_Stirling,Local_Authority_(District)_Stockport,Local_Authority_(District)_Stockton-on-Tees,Local_Authority_(District)_Stoke-on-Trent,Local_Authority_(District)_Stratford-upon-Avon,Local_Authority_(District)_Stroud,Local_Authority_(District)_Suffolk Coastal,Local_Authority_(District)_Sunderland,Local_Authority_(District)_Surrey Heath,Local_Authority_(District)_Sutton,Local_Authority_(District)_Swale,Local_Authority_(District)_Swansea,Local_Authority_(District)_Swindon,Local_Authority_(District)_Tameside,Local_Authority_(District)_Tamworth,Local_Authority_(District)_Tandridge,Local_Authority_(District)_Taunton Deane,Local_Authority_(District)_Teesdale,Local_Authority_(District)_Teignbridge,Local_Authority_(District)_Telford and Wrekin,Local_Authority_(District)_Tendring,Local_Authority_(District)_Test Valley,Local_Authority_(District)_Tewkesbury,Local_Authority_(District)_Thanet,Local_Authority_(District)_The Vale of Glamorgan,Local_Authority_(District)_Three Rivers,Local_Authority_(District)_Thurrock,Local_Authority_(District)_Tonbridge and Malling,Local_Authority_(District)_Torbay,Local_Authority_(District)_Torfaen,Local_Authority_(District)_Torridge,Local_Authority_(District)_Tower Hamlets,Local_Authority_(District)_Trafford,Local_Authority_(District)_Tunbridge Wells,Local_Authority_(District)_Tynedale,Local_Authority_(District)_Uttlesford,Local_Authority_(District)_Vale Royal,Local_Authority_(District)_Vale of White Horse,Local_Authority_(District)_Wakefield,Local_Authority_(District)_Walsall,Local_Authority_(District)_Waltham Forest,Local_Authority_(District)_Wandsworth,Local_Authority_(District)_Wansbeck,Local_Authority_(District)_Warrington,Local_Authority_(District)_Warwick,Local_Authority_(District)_Watford,Local_Authority_(District)_Waveney,Local_Authority_(District)_Waverley,Local_Authority_(District)_Wealden,Local_Authority_(District)_Wear Valley,Local_Authority_(District)_Wellingborough,Local_Authority_(District)_Welwyn Hatfield,Local_Authority_(District)_West Berkshire,Local_Authority_(District)_West Devon,Local_Authority_(District)_West Dorset,Local_Authority_(District)_West Dunbartonshire,Local_Authority_(District)_West Lancashire,Local_Authority_(District)_West Lindsey,Local_Authority_(District)_West Lothian,Local_Authority_(District)_West Oxfordshire,Local_Authority_(District)_West Somerset,Local_Authority_(District)_West Wiltshire,Local_Authority_(District)_Western Isles,Local_Authority_(District)_Westminster,Local_Authority_(District)_Weymouth and Portland,Local_Authority_(District)_Wigan,Local_Authority_(District)_Wiltshire,Local_Authority_(District)_Winchester,Local_Authority_(District)_Windsor and Maidenhead,Local_Authority_(District)_Wirral,Local_Authority_(District)_Woking,Local_Authority_(District)_Wokingham,Local_Authority_(District)_Wolverhampton,Local_Authority_(District)_Worcester,Local_Authority_(District)_Worthing,Local_Authority_(District)_Wrexham,Local_Authority_(District)_Wychavon,Local_Authority_(District)_Wycombe,Local_Authority_(District)_Wyre,Local_Authority_(District)_Wyre Forest,Local_Authority_(District)_York,Day_of_Week_Monday,Day_of_Week_Saturday,Day_of_Week_Sunday,Day_of_Week_Thursday,Day_of_Week_Tuesday,Day_of_Week_Wednesday,Hour_01,Hour_02,Hour_03,Hour_04,Hour_05,Hour_06,Hour_07,Hour_08,Hour_09,Hour_10,Hour_11,Hour_12,Hour_13,Hour_14,Hour_15,Hour_16,Hour_17,Hour_18,Hour_19,Hour_20,Hour_21,Hour_22,Hour_23
0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [56]:
# sanity check for null values 
features_final.isnull().sum(axis = 0)

Speed_limit_10.0                                       0
Speed_limit_15.0                                       0
Speed_limit_20.0                                       0
Speed_limit_30.0                                       0
Speed_limit_40.0                                       0
Speed_limit_50.0                                       0
Speed_limit_60.0                                       0
Speed_limit_70.0                                       0
Junction_Detail_Mini-roundabout                        0
Junction_Detail_More than 4 arms (not roundabout)      0
Junction_Detail_Not at junction or within 20 metres    0
Junction_Detail_Other junction                         0
Junction_Detail_Private drive or entrance              0
Junction_Detail_Roundabout                             0
Junction_Detail_Slip road                              0
Junction_Detail_T or staggered junction                0
Light_Conditions_Darkness - lights lit                 0
Light_Conditions_Darkness - lig

In [32]:

# Split the 'features' and 'income' data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features_final, 
                                                    severity, 
                                                    test_size = 0.2, 
                                                    random_state = 0)

# Show the results of the split
print("Training set has {} samples.".format(X_train.shape[0]))
print("Testing set has {} samples.".format(X_test.shape[0]))

Training set has 1429989 samples.
Testing set has 357498 samples.


In [33]:
# TODO: Import two metrics from sklearn - fbeta_score and accuracy_score
# this code is copied from previouly submitted project (Supervised Learning) as part of Udacity's Nanodegree program: 
def train_predict(learner, sample_size, X_train, y_train, X_test, y_test): 
    '''
    inputs:
       - learner: the learning algorithm to be trained and predicted on
       - sample_size: the size of samples (number) to be drawn from training set
       - X_train: features training set
       - y_train: income training set
       - X_test: features testing set
       - y_test: income testing set
    '''
    
    results = {}
    
    # TODO: Fit the learner to the training data using slicing with 'sample_size' using .fit(training_features[:], training_labels[:])
    start = time() # Get start time
    learner = learner.fit(X_train[:sample_size], y_train[:sample_size])
    end = time() # Get end time
    
    # TODO: Calculate the training time  
    results['train_time'] = end - start
        
    # TODO: Get the predictions on the test set(X_test),
    #       then get predictions on the first 300 training samples(X_train) using .predict()
    start = time() # Get start time
    predictions_test = learner.predict(X_test)
    predictions_train = learner.predict(X_train[:300])
    end = time() # Get end time
    
    # TODO: Calculate the total prediction time
    results['pred_time'] = end - start
            
    # TODO: Compute accuracy on the first 300 training samples which is y_train[:300]
    results['acc_train'] = accuracy_score(y_train[:300], predictions_train)
        
    # TODO: Compute accuracy on test set using accuracy_score()
    results['acc_test'] = accuracy_score(y_test, predictions_test)
    
    # TODO: Compute F-score on the the first 300 training samples using fbeta_score()
    results['f_train'] = fbeta_score(y_train[:300], predictions_train, beta=0.5, average='micro')
        
    # TODO: Compute F-score on the test set which is y_test
    results['f_test'] = fbeta_score(y_test, predictions_test, beta=0.5, average='micro')
       
    # Success
    print("{} trained on {} samples.".format(learner.__class__.__name__, sample_size))
        
    # Return the results
    return results

In [34]:

# Initialize the models
clf_A = DecisionTreeClassifier(random_state=9)
clf_B = AdaBoostClassifier(random_state=9)
clf_C = GradientBoostingClassifier(random_state=9)
clf_D = RandomForestClassifier(random_state=9)

# TODO: Calculate the number of samples for 1%, 10%, and 100% of the training data
# HINT: samples_100 is the entire training set i.e. len(y_train)
# HINT: samples_10 is 10% of samples_100 (ensure to set the count of the values to be `int` and not `float`)
# HINT: samples_1 is 1% of samples_100 (ensure to set the count of the values to be `int` and not `float`)

samples_100 = len(y_train)
samples_10 = int(samples_100 * 0.10)
samples_1 = int (samples_100 * 0.01)

# Collect results on the learners
results = {}
for clf in [clf_A, clf_B, clf_C,clf_D]:
    clf_name = clf.__class__.__name__
    results[clf_name] = {}
    for i, samples in enumerate([samples_1, samples_10, samples_100]):
        results[clf_name][i] = \
        train_predict(clf, samples, X_train, y_train, X_test, y_test)

DecisionTreeClassifier trained on 14299 samples.
DecisionTreeClassifier trained on 142998 samples.
DecisionTreeClassifier trained on 1429989 samples.
AdaBoostClassifier trained on 14299 samples.
AdaBoostClassifier trained on 142998 samples.
AdaBoostClassifier trained on 1429989 samples.
GradientBoostingClassifier trained on 14299 samples.
GradientBoostingClassifier trained on 142998 samples.
GradientBoostingClassifier trained on 1429989 samples.
RandomForestClassifier trained on 14299 samples.
RandomForestClassifier trained on 142998 samples.
RandomForestClassifier trained on 1429989 samples.


In [44]:
results['RandomForestClassifier']

{0: {'acc_test': 0.83905644227380294,
  'acc_train': 0.97666666666666668,
  'f_test': 0.83905644227380294,
  'f_train': 0.97666666666666679,
  'pred_time': 2.200040817260742,
  'train_time': 0.828726053237915},
 1: {'acc_test': 0.83862567063312243,
  'acc_train': 0.98666666666666669,
  'f_test': 0.83862567063312243,
  'f_train': 0.98666666666666658,
  'pred_time': 3.2391297817230225,
  'train_time': 16.818516969680786},
 2: {'acc_test': 0.83822007395845566,
  'acc_train': 0.96666666666666667,
  'f_test': 0.83822007395845566,
  'f_train': 0.96666666666666656,
  'pred_time': 5.612246990203857,
  'train_time': 285.2494161128998}}

In [45]:
clf_D

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=9, verbose=0, warm_start=False)

In [39]:
def evaluate(results):
    """
    Visualization code to display results of various learners.
    
    inputs:
      - learners: a list of supervised learners
      - stats: a list of dictionaries of the statistic results from 'train_predict()'
      - accuracy: The score for the naive predictor
      - f1: The score for the naive predictor
    """
  
    subplot_titles = ('train_time', 'acc_train', 'f_train', 'pred_time', 'acc_test', 'f_test')
    fig = tools.make_subplots(rows=2, cols=3, subplot_titles=subplot_titles)
    result_df = pd.DataFrame.from_dict(results)
    bupu_sub = bupu[3:].copy()
    x = result_df.index.to_list()

    for j, metric in enumerate(['train_time', 'acc_train', 'f_train', 'pred_time', 'acc_test', 'f_test']):
        i = j+3
        showlegend = True
        for ix, col in enumerate(result_df.columns):
            y_data = []
            for z, row in result_df.iterrows():
                y_data.append(row[col][metric])
            fig.append_trace(go.Bar(x=x, y=y_data, marker=dict(color=bupu_sub[ix]),name=col,showlegend= showlegend), i//3, j%3+1)


    layout = go.Layout(
    title='Model Result',
    xaxis = dict(ticks='', nticks=36, automargin=True),
    yaxis = dict(ticks='', automargin=True), 
    font=dict(size=10),
    showlegend=False)

    fig['layout'].update(layout)

    plotly.offline.iplot(fig)
    py.iplot(fig, filename='Model Evaluation Result')


In [52]:
def feature_plot(importances, X_train, y_train):

    # TODO: Extract the feature importances using .feature_importances_ 
    indices = np.argsort(importances)[::-1]
    columns = X_train.columns.values[indices[:20]]
    values = importances[indices][:20]
    cumulative_weight = np.cumsum(values)

    trace_1 = go.Bar(
                x=columns,
                y=values,
                name='Feature Weight',
                marker=dict(
                color=bupu[3]))

    trace_2 = go.Scatter(
        x = columns,
        y = cumulative_weight, 
        name = 'Cumulative Weight', 
        mode = 'lines',
        legendgroup= 'group2', 
        marker=dict(
                color=bupu[6])
    )

    layout = go.Layout(title="Model Feature Importances", 
                            barmode='group', 
                            xaxis = dict(ticks='', nticks=24, 
                                        title=go.layout.xaxis.Title(
                                        text='Feature'), automargin=True),
                            yaxis = dict(title=go.layout.yaxis.Title(
                                        text=''), automargin=True))

    fig = go.Figure(data=[trace_1, trace_2], layout=layout)
    py.iplot(fig, filename='Model Feature Importances')
    plotly.offline.iplot(fig)

In [40]:
# Run metrics visualization for the supervised learning models chosen
evaluate(results)

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]  [ (1,3) x3,y3 ]
[ (2,1) x4,y4 ]  [ (2,2) x5,y5 ]  [ (2,3) x6,y6 ]



We experimented with multiple classification algorithms (DecisionTreeClassifier, AdaBoostClassifier, GradientBoostingClassifier, RandomForestClassifier), this is what we achieved and our findings: 
- All models achaived high score, so we will consider different factors to make a dicision. 
- DecisionTree is probably overfitting the model, since we achived 100% accueracy on training set, and performed poorly in testing set in comparison with other models, thus it will be eliminated. 
- GradientBoosting: achieved great result, however the time consumbtion is very high, and since this problem is very timily senstive when used in production, it will be eliminated
- This leaves us to AdaBoosting, and RandomForest; both scored very similar, however considering the training and prediction time, RandomForest is our final choice. 

In [50]:
# TODO: Import a supervised learning model that has 'feature_importances_'
best_clf = RandomForestClassifier(criterion='gini', 
                             max_depth=15, 
                             max_features='auto', 
                             min_samples_split=100, 
                             n_estimators=50, 
                             random_state=9)

clf = clf_D
# Get the estimator
best_clf = best_clf.fit(X_train, y_train)
print (best_clf)
# Make predictions using the unoptimized and model
predictions = (clf.fit(X_train, y_train)).predict(X_test)
best_predictions = best_clf.predict(X_test)


# Report the before-and-afterscores
print("Unoptimized model\n------")
print("Accuracy score on testing data: {:.4f}".format(accuracy_score(y_test, predictions)))
print("F-score on testing data: {:.4f}".format(fbeta_score(y_test, predictions, beta = 0.5, average='micro')))
print("\nOptimized Model\n------")
print("Final accuracy score on the testing data: {:.4f}".format(accuracy_score(y_test, best_predictions)))
print("Final F-score on the testing data: {:.4f}".format(fbeta_score(y_test, best_predictions, beta = 0.5, average='micro')))

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=15, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=100,
            min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=1,
            oob_score=False, random_state=9, verbose=0, warm_start=False)
Unoptimized model
------
Accuracy score on testing data: 0.8382
F-score on testing data: 0.8382

Optimized Model
------
Final accuracy score on the testing data: 0.8413
Final F-score on the testing data: 0.8413


In [53]:

# TODO: Train the supervised model on the training set using .fit(X_train, y_train)
model = best_clf

# TODO: Extract the feature importances using .feature_importances_ 
importances = model.feature_importances_ 

# Plot
feature_plot(importances, X_train, y_train)

Based on above graph, there's not much value for  features independently, where top 15 features accounted for (0.06–0.01), yet the cumulative weight gives the model good result and accounts for (0.7) of model importance. Overall, some important outtakes are: 
1. Top feature is vehicles of type: Motorcycle over 500c, which supports our previous insight that showed cyclist are the second frequent type invloved in accidents and as casualties. 

2. We can recognize two dominating features before one-hot encoding; Vehicle Type and Vehicles Manoeuvre, it's a good thing we have merged these feature from vehicles dataset into our final features. 

3. Weather Condition and Road-related features, were among the top features, which is alerting to increase safety amont special conditions. 

4. I was expecting some location-based features to effect the model, possiblly that these features were too general to affect the model, however this data is very useful in different analysis approach.

## Final Thoughts:

In this analysis we explored problem from different prospectives, yet leaving so much to uncover. We can summarize our findings as follows:

1. From the day-hour heatmaps, we can suggest to increasing response time during rush-hours, or construct roads to divert traffic from congested areaa. 
2. Investigate high density points in map geolocation analysis to discover construction needs.
3. From model feature importnce we can recognize dangerous juntions which cause spesific manoeuvres.
4. We explored the conditions of Vehicle and Casualties such as (Type of vehicle, Age of vehicles, Pedestrian movements and location when they got in accidents, etc..), these insights are useful for policy makers to understand how accidents happen and provide solutions to limit the cuase.   

Finally, I really appreciate the UK government efforts to provide this open data in very orginazed and well-documented format. I would like to explore similar dataset from my hometown! 

## References: 

The work in this blog has been inspired by these great resources:

* Facts: 
    - https://www.asirt.org/safe-travel/road-safety-facts/
    - https://www.who.int/en/news-room/fact-sheets/detail/road-traffic-injuries


* Datasets: 
    - https://www.kaggle.com/silicon99/dft-accident-data
    - https://data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data  


* Methedology: 
    - https://towardsdatascience.com/the-search-for-categorical-correlation-a1cf7f1888c9
    - https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b


* Technical: 
    - https://stackoverflow.com/questions/6740918/creating-a-dictionary-from-a-csv-file  
    - https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict
    - https://stackoverflow.com/questions/21269399/datetime-dtypes-in-pandas-read-csv
    - https://www.kaggle.com/daveianhickey/how-to-folium-for-maps-heatmaps-time-data
    - https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/