<a href="https://colab.research.google.com/github/vanderbilt-ml/50-nelson-mlproj-waittime/blob/main/wait_time_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Wait Time Prediction


## Background

Recently when planning an upcoming vacation I discovered that a company called Touringplans (touringplans.com) has many publically available data sets with captured wait times for attractions at Walt Disney World in Florida dating back to 2015. I'm intrigued by this data and am interested in building a predective model using the historical wait time data to help forecast future wait times.

## Project Description

Using the captured historical wait time data I would like to create a predictive model that will help myself to understand future wait times of attractions at Walt Disney World in Florida.

The following columns represent my core data:


*   Date: The captured data date
*   DateTime: The captured data datetime
*   SActMin: The actual wait time at the given datetime (if catpured)
*   SPostMin: The posted wait time at the given datetime



Via the metadata.csv file we have loads of relevant information for each date our data has been collected for. I will be able to utilize this data by joining metadata.csv and our sample data via the DATE column. Within this file are important pieces of information like:

*   DayOfWeek
*   DayOfYear
*   WeekOfYear
*   MonthOfYear
*   Season
*   MaxTemp
*   MinTemp
*   MeanTemp



## Performance Metric
Given the abundance of available data I imagine I will be able to split the data into both training and testing data. I would like to be able to create a predictive model with somewhere in the 80-90% accuracy range. At this point however I have no clue if that is possible.

## Required Imports

In [384]:
#tables and visualizations
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

#machine learning
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.pipeline import Pipeline 
from sklearn.compose import ColumnTransformer, make_column_selector
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelBinarizer, StandardScaler
from sklearn import config_context
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay, roc_curve, roc_auc_score

## Load Data

The metadata is stored in a separate file; loading in both predictive data and metadata, then combining.

In [385]:
wait_time_raw_data = pd.read_csv('https://raw.githubusercontent.com/vanderbilt-ml/50-nelson-mlproj-waittime/main/big_thunder_mtn.csv')
metadata = pd.read_csv('https://raw.githubusercontent.com/vanderbilt-ml/50-nelson-mlproj-waittime/main/provided_data/metadata.csv')
# To minimize training time for now I've limited the number of metadata columns I'm using to just the following:
metadata = metadata[['DATE', 'DAYOFWEEK', 'DAYOFYEAR', 'WEEKOFYEAR','MONTHOFYEAR']] #, 'SEASON']]
metadata.rename(columns = {'DATE':'date'},  inplace=True)
wait_time_data = pd.merge(wait_time_raw_data, metadata, on ='date')
# Currently having some issues with datetime objects during training, here's some of my attempts to remedy the issue
# wait_time_data['date'] = pd.to_datetime(wait_time_data['date'])
# wait_time_data['datetime'] = pd.to_datetime(wait_time_data['datetime'])
# wait_time_data['datetime'] = np.Timestamp(np.datetime64(wait_time_data['datetime']))
# wait_time_data['datetime'] = wait_time_data['datetime'].values.astype('datetime64[D]')
# wait_time_data['date'] = wait_time_data['date'].values.astype('datetime64[D]')
print(wait_time_data.shape)
print(wait_time_data.head())

(268969, 8)
         date             datetime  SACTMIN  SPOSTMIN  DAYOFWEEK  DAYOFYEAR  \
0  01/01/2015  2015-01-01 08:02:13      NaN       5.0          5          0   
1  01/01/2015  2015-01-01 08:09:12      NaN      15.0          5          0   
2  01/01/2015  2015-01-01 08:16:12      NaN      20.0          5          0   
3  01/01/2015  2015-01-01 08:23:12      NaN      20.0          5          0   
4  01/01/2015  2015-01-01 08:23:53      NaN      20.0          5          0   

   WEEKOFYEAR  MONTHOFYEAR  
0           0            1  
1           0            1  
2           0            1  
3           0            1  
4           0            1  


## Data Cleaning and Validation

In [386]:
wait_time_data.isna().sum()


date                0
datetime            0
SACTMIN        260224
SPOSTMIN         8745
DAYOFWEEK           0
DAYOFYEAR           0
WEEKOFYEAR          0
MONTHOFYEAR         0
dtype: int64

In [387]:
wait_time_data.shape

(268969, 8)

We have many entries with -999 entered as their SPOSTMIN entry. I'll go ahead and drop those. 

In [388]:
wait_time_data = wait_time_data[wait_time_data.SPOSTMIN != -999]
print(wait_time_data.shape)

(246931, 8)


The SACTMIN and SPOSTMIN entries are mutually exclusive. Meaning for every data entry only one of the columns will have data. The SACTMIN should be more valuable data than the SPOSTMIN column; I'm not sure yet how I should handle this so I'll leave them as-is for now

Dropping any columns that are completely empty

In [389]:
wait_time_data.dropna(how='all', axis=1, inplace=True)
display(wait_time_data)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)


Unnamed: 0,date,datetime,SACTMIN,SPOSTMIN,DAYOFWEEK,DAYOFYEAR,WEEKOFYEAR,MONTHOFYEAR
0,01/01/2015,2015-01-01 08:02:13,,5.0,5,0,0,1
1,01/01/2015,2015-01-01 08:09:12,,15.0,5,0,0,1
2,01/01/2015,2015-01-01 08:16:12,,20.0,5,0,0,1
3,01/01/2015,2015-01-01 08:23:12,,20.0,5,0,0,1
4,01/01/2015,2015-01-01 08:23:53,,20.0,5,0,0,1
...,...,...,...,...,...,...,...,...
268962,08/31/2021,2021-08-31 20:32:54,,10.0,3,242,35,8
268963,08/31/2021,2021-08-31 20:40:13,,10.0,3,242,35,8
268964,08/31/2021,2021-08-31 20:47:24,,10.0,3,242,35,8
268965,08/31/2021,2021-08-31 20:54:12,,10.0,3,242,35,8


## Feature Engineering

For now, given the mutually exclusive data relationship between SACTMIN and SPOSTMIN I am going to collapse them into one column. SACTMIN represents human-captured wait time (someone stood in line and captured their wait length) and SPOSTMIN captures the posted wait time. In my opion this makes SACTMIN data more valuable, but given the small percentage of data entries that SACTMIN data makes up I'm not sure what other approach to take at this point.

In [390]:
wait_time_data[wait_time_data["SACTMIN"].notna()].head()

Unnamed: 0,date,datetime,SACTMIN,SPOSTMIN,DAYOFWEEK,DAYOFYEAR,WEEKOFYEAR,MONTHOFYEAR
63,01/01/2015,2015-01-01 14:55:16,37.0,,5,0,0,1
142,01/02/2015,2015-01-02 08:40:32,3.0,,6,1,0,1
152,01/02/2015,2015-01-02 09:30:53,35.0,,6,1,0,1
160,01/02/2015,2015-01-02 10:16:26,47.0,,6,1,0,1
190,01/02/2015,2015-01-02 13:16:31,54.0,,6,1,0,1


In [391]:
wait_time_data['wait'] = pd.to_numeric(wait_time_data[['SACTMIN', 'SPOSTMIN']].bfill(axis=1).iloc[:, 0])
wait_time_data[wait_time_data["SACTMIN"].notna()].head()

Unnamed: 0,date,datetime,SACTMIN,SPOSTMIN,DAYOFWEEK,DAYOFYEAR,WEEKOFYEAR,MONTHOFYEAR,wait
63,01/01/2015,2015-01-01 14:55:16,37.0,,5,0,0,1,37.0
142,01/02/2015,2015-01-02 08:40:32,3.0,,6,1,0,1,3.0
152,01/02/2015,2015-01-02 09:30:53,35.0,,6,1,0,1,35.0
160,01/02/2015,2015-01-02 10:16:26,47.0,,6,1,0,1,47.0
190,01/02/2015,2015-01-02 13:16:31,54.0,,6,1,0,1,54.0


In [392]:
wait_time_data = wait_time_data.drop('SACTMIN', axis=1)
wait_time_data = wait_time_data.drop('SPOSTMIN', axis=1)
wait_time_data.head()

Unnamed: 0,date,datetime,DAYOFWEEK,DAYOFYEAR,WEEKOFYEAR,MONTHOFYEAR,wait
0,01/01/2015,2015-01-01 08:02:13,5,0,0,1,5.0
1,01/01/2015,2015-01-01 08:09:12,5,0,0,1,15.0
2,01/01/2015,2015-01-01 08:16:12,5,0,0,1,20.0
3,01/01/2015,2015-01-01 08:23:12,5,0,0,1,20.0
4,01/01/2015,2015-01-01 08:23:53,5,0,0,1,20.0


## Datetime Issues (don't run)

In [327]:
print(wait_time_data.dtypes)

date            object
datetime        object
DAYOFWEEK        int64
DAYOFYEAR        int64
WEEKOFYEAR       int64
MONTHOFYEAR      int64
wait           float64
dtype: object


In [317]:
# wait_time_data['date'] =  pd.to_datetime(wait_time_data['date'], format='%m/%d/%Y')
wait_time_data['datetime'] =  pd.to_datetime(wait_time_data['datetime'], format='%Y-%m-%d %H:%M:%S')

In [293]:
print(wait_time_data.dtypes)

date           datetime64[ns]
datetime       datetime64[ns]
DAYOFWEEK               int64
DAYOFYEAR               int64
WEEKOFYEAR              int64
MONTHOFYEAR             int64
wait                  float64
dtype: object


In [116]:
# wait_time_data['date'] =  wait_time_data['date'].values.astype('datetime64[D]').dtype
# wait_time_data['datetime'] =  wait_time_data['datetime'].values.astype('datetime64[D]').dtype

In [210]:
# print(wait_time_data.dtypes)
wait_time_data = wait_time_data.drop('date', axis=1)
wait_time_data = wait_time_data.drop('datetime', axis=1)
# wait_time_data.head()

In [247]:
wait_time_data['wait'] = wait_time_data['wait'].astype(np.int64)

In [197]:
# print(wait_time_data['date'].map(type) == pd.datetime)

NameError: ignored

In [274]:
wait_time_data['date'] = wait_time_data['date'].astype(float)
wait_time_data['datetime'] =  wait_time_data['datetime'].astype(float)

TypeError: ignored

In [None]:
wait_time_data['date']

In [294]:
print(wait_time_data.dtypes)

date           datetime64[ns]
datetime       datetime64[ns]
DAYOFWEEK               int64
DAYOFYEAR               int64
WEEKOFYEAR              int64
MONTHOFYEAR             int64
wait                  float64
dtype: object


In [295]:
wait_time_data

Unnamed: 0,date,datetime,DAYOFWEEK,DAYOFYEAR,WEEKOFYEAR,MONTHOFYEAR,wait
0,2015-01-01,2015-01-01 08:02:13,5,0,0,1,5.0
1,2015-01-01,2015-01-01 08:09:12,5,0,0,1,15.0
2,2015-01-01,2015-01-01 08:16:12,5,0,0,1,20.0
3,2015-01-01,2015-01-01 08:23:12,5,0,0,1,20.0
4,2015-01-01,2015-01-01 08:23:53,5,0,0,1,20.0
...,...,...,...,...,...,...,...
268962,2021-08-31,2021-08-31 20:32:54,3,242,35,8,10.0
268963,2021-08-31,2021-08-31 20:40:13,3,242,35,8,10.0
268964,2021-08-31,2021-08-31 20:47:24,3,242,35,8,10.0
268965,2021-08-31,2021-08-31 20:54:12,3,242,35,8,10.0


## Dropping datetimes

Still really struggling to understand the issues I'm running into while including the datetime objects in my data. Removing them for now.

In [393]:
wait_time_data = wait_time_data.drop('date', axis=1)
wait_time_data = wait_time_data.drop('datetime', axis=1)
wait_time_data.dtypes

DAYOFWEEK        int64
DAYOFYEAR        int64
WEEKOFYEAR       int64
MONTHOFYEAR      int64
wait           float64
dtype: object

## Test Train Split

In [394]:
wait_time_data = wait_time_data.dropna(subset=['wait'])
wait_time_data.shape

(246931, 5)

In [395]:
class_column = 'wait'
random_seed = 2435

wait_time_data = wait_time_data[:5000]

X_train, X_test, y_train, y_test = train_test_split(wait_time_data.drop(columns=class_column), wait_time_data[class_column],
                                                    test_size=0.25, random_state=random_seed)#, stratify=wait_time_data[class_column])

In [396]:
wait_time_data.shape

(5000, 5)

In [397]:
# X Train
print('On X train: ')
print('X train dimensions: ', X_train.shape)
display(X_train.head())
display(X_train.dtypes)

# X test
print('\nOn X test: ')
print('X test dimensions: ', X_test.shape)
display(X_test.head())
display(X_test.dtypes)

On X train: 
X train dimensions:  (3750, 4)


Unnamed: 0,DAYOFWEEK,DAYOFYEAR,WEEKOFYEAR,MONTHOFYEAR
4606,1,31,5,2
163,6,1,0,1
4196,5,28,4,1
4821,3,33,5,2
3265,6,22,3,1


DAYOFWEEK      int64
DAYOFYEAR      int64
WEEKOFYEAR     int64
MONTHOFYEAR    int64
dtype: object


On X test: 
X test dimensions:  (1250, 4)


Unnamed: 0,DAYOFWEEK,DAYOFYEAR,WEEKOFYEAR,MONTHOFYEAR
4993,4,34,5,2
2204,6,15,2,1
3903,3,26,4,1
279,6,1,0,1
272,6,1,0,1


DAYOFWEEK      int64
DAYOFYEAR      int64
WEEKOFYEAR     int64
MONTHOFYEAR    int64
dtype: object

In [398]:
# Y Train
print('On y train: ')
print('y train dimensions: ', y_train.shape)
display(y_train.head())
display(y_train.dtypes)

# Y test
print('\nOn y test: ')
print('y test dimensions: ', y_test.shape)
display(y_test.head())
display(y_test.dtypes)

On y train: 
y train dimensions:  (3750,)


4606    40.0
163     60.0
4196    15.0
4821    10.0
3265    20.0
Name: wait, dtype: float64

dtype('float64')


On y test: 
y test dimensions:  (1250,)


4993    10.0
2204    10.0
3903    18.0
279     55.0
272     55.0
Name: wait, dtype: float64

dtype('float64')

## Establish training pipelines

In [399]:
#individual pipelines for differing datatypes
cat_pipeline = Pipeline(steps=[('cat_impute', SimpleImputer(missing_values=np.nan, strategy='most_frequent')),
                               ('onehot_cat', OneHotEncoder(drop='if_binary'))])
num_pipeline = Pipeline(steps=[('impute_num', SimpleImputer(missing_values=np.nan, strategy='mean')),
                               ('scale_num', StandardScaler())])

In [400]:
#establish preprocessing pipeline by columns
preproc = ColumnTransformer([('cat_pipe', cat_pipeline, make_column_selector(dtype_include=object)),
                             ('num_pipe', num_pipeline, make_column_selector(dtype_include=np.number))],
                             remainder='passthrough')

In [401]:
#generate the whole modeling pipeline with preprocessing
logistic_regression_pipeline = Pipeline(steps=[('preproc', preproc), ('mdl', LogisticRegression(penalty='elasticnet', solver='saga', tol=0.01))])

#visualization for steps
with config_context(display='diagram'):
    display(logistic_regression_pipeline)

In [402]:
random_forest_pipeline = Pipeline(steps=[('preproc', preproc), ('mdl', RandomForestClassifier())])

# Feel free to uncomment and edit the code below to visualize your overall pieline
with config_context(display='diagram'):
    display(random_forest_pipeline)

In [403]:
gradient_boosting_pipeline = Pipeline(steps=[('preproc', preproc), ('mdl', GradientBoostingClassifier())])

# Feel free to uncomment and edit the code below to visualize your overall pieline
with config_context(display='diagram'):
    display(gradient_boosting_pipeline)

## Cross-validation with hyperparameter tuning

In [404]:
logistic_regression_tuning_grid = {'mdl__l1_ratio' : np.linspace(0,1,5), 'mdl__C': np.logspace(-1, 6, 3)}
random_forest_tuning_grid = {'mdl__n_estimators': [10,100]}
gradient_boosting_tuning_grid = {'mdl__n_estimators': [10,100]}

model_scores = []

In [405]:
logistic_regression_grid_search = GridSearchCV(logistic_regression_pipeline, param_grid = logistic_regression_tuning_grid, cv = 5, return_train_score=True)
logistic_regression_grid_search.fit(X_train, y_train)
model_scores.append([logistic_regression_grid_search.best_score_, logistic_regression_grid_search.best_params_])



In [406]:
random_forest_grid_search = GridSearchCV(random_forest_pipeline, param_grid = random_forest_tuning_grid, cv = 5, return_train_score=True)
random_forest_grid_search.fit(X_train, y_train)
model_scores.append([random_forest_grid_search.best_score_, random_forest_grid_search.best_params_])



In [407]:
gradient_boosting_grid_search = GridSearchCV(gradient_boosting_pipeline, param_grid = gradient_boosting_tuning_grid, cv = 5, return_train_score=True)
gradient_boosting_grid_search.fit(X_train, y_train)
model_scores.append([gradient_boosting_grid_search.best_score_, gradient_boosting_grid_search.best_params_])



In [408]:
print("Logistic Regression: \n" + str(model_scores[0]))
print("Random Forest: \n" +  str(model_scores[1]))
print("Gradient Boosting: \n" +  str(model_scores[2]))

Logistic Regression: 
[0.22773333333333334, {'mdl__C': 0.1, 'mdl__l1_ratio': 0.0}]
Random Forest: 
[0.29840000000000005, {'mdl__n_estimators': 10}]
Gradient Boosting: 
[0.2938666666666666, {'mdl__n_estimators': 100}]


## Final Fit

In [409]:
print(logistic_regression_grid_search.best_estimator_)

Pipeline(steps=[('preproc',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('cat_pipe',
                                                  Pipeline(steps=[('cat_impute',
                                                                   SimpleImputer(strategy='most_frequent')),
                                                                  ('onehot_cat',
                                                                   OneHotEncoder(drop='if_binary'))]),
                                                  <sklearn.compose._column_transformer.make_column_selector object at 0x7f9269f52810>),
                                                 ('num_pipe',
                                                  Pipeline(steps=[('impute_num',
                                                                   SimpleImputer()),
                                                                  ('scale_num',
                                

In [410]:
print(random_forest_grid_search.best_estimator_)

Pipeline(steps=[('preproc',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('cat_pipe',
                                                  Pipeline(steps=[('cat_impute',
                                                                   SimpleImputer(strategy='most_frequent')),
                                                                  ('onehot_cat',
                                                                   OneHotEncoder(drop='if_binary'))]),
                                                  <sklearn.compose._column_transformer.make_column_selector object at 0x7f926dd1ee50>),
                                                 ('num_pipe',
                                                  Pipeline(steps=[('impute_num',
                                                                   SimpleImputer()),
                                                                  ('scale_num',
                                

In [411]:
print(gradient_boosting_grid_search.best_estimator_)

Pipeline(steps=[('preproc',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('cat_pipe',
                                                  Pipeline(steps=[('cat_impute',
                                                                   SimpleImputer(strategy='most_frequent')),
                                                                  ('onehot_cat',
                                                                   OneHotEncoder(drop='if_binary'))]),
                                                  <sklearn.compose._column_transformer.make_column_selector object at 0x7f926b086610>),
                                                 ('num_pipe',
                                                  Pipeline(steps=[('impute_num',
                                                                   SimpleImputer()),
                                                                  ('scale_num',
                                

## Variable Importance

In [412]:
print(logistic_regression_grid_search.classes_)

[  1.   2.   3.   4.   5.   6.   7.   8.  10.  11.  12.  13.  14.  15.
  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.  30.  33.
  35.  36.  37.  40.  42.  45.  47.  50.  54.  55.  60.  62.  65.  70.
  75.  80.  85.  90.  95. 105. 125. 140.]


In [413]:
print(random_forest_grid_search.classes_)

[  1.   2.   3.   4.   5.   6.   7.   8.  10.  11.  12.  13.  14.  15.
  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.  30.  33.
  35.  36.  37.  40.  42.  45.  47.  50.  54.  55.  60.  62.  65.  70.
  75.  80.  85.  90.  95. 105. 125. 140.]


In [414]:
print(gradient_boosting_grid_search.classes_)

[  1.   2.   3.   4.   5.   6.   7.   8.  10.  11.  12.  13.  14.  15.
  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.  30.  33.
  35.  36.  37.  40.  42.  45.  47.  50.  54.  55.  60.  62.  65.  70.
  75.  80.  85.  90.  95. 105. 125. 140.]


## Performance Metrics On Test Data

In [416]:
print(classification_report(y_test, logistic_regression_grid_search.best_estimator_.predict(X_test)))

              precision    recall  f1-score   support

         2.0       0.00      0.00      0.00         1
         4.0       0.00      0.00      0.00         1
         5.0       0.00      0.00      0.00        17
         8.0       0.00      0.00      0.00         4
        10.0       0.00      0.00      0.00       163
        11.0       0.00      0.00      0.00         3
        12.0       0.00      0.00      0.00         2
        13.0       0.00      0.00      0.00         1
        14.0       0.00      0.00      0.00         1
        15.0       0.00      0.00      0.00        26
        18.0       0.00      0.00      0.00         4
        19.0       0.00      0.00      0.00         1
        20.0       0.00      0.00      0.00       213
        21.0       0.00      0.00      0.00         1
        22.0       0.00      0.00      0.00         2
        23.0       0.00      0.00      0.00         1
        25.0       0.00      0.00      0.00         1
        27.0       0.00    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [418]:
print(classification_report(y_test, random_forest_grid_search.best_estimator_.predict(X_test)))

              precision    recall  f1-score   support

         2.0       0.00      0.00      0.00         1
         4.0       0.00      0.00      0.00         1
         5.0       0.20      0.06      0.09        17
         8.0       0.00      0.00      0.00         4
        10.0       0.31      0.10      0.16       163
        11.0       0.00      0.00      0.00         3
        12.0       0.00      0.00      0.00         2
        13.0       0.00      0.00      0.00         1
        14.0       0.00      0.00      0.00         1
        15.0       0.58      1.00      0.73        26
        18.0       0.00      0.00      0.00         4
        19.0       0.00      0.00      0.00         1
        20.0       0.36      0.15      0.21       213
        21.0       0.00      0.00      0.00         1
        22.0       0.00      0.00      0.00         2
        23.0       0.00      0.00      0.00         1
        25.0       0.00      0.00      0.00         1
        27.0       0.00    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [419]:
print(classification_report(y_test, gradient_boosting_grid_search.best_estimator_.predict(X_test)))

              precision    recall  f1-score   support

         2.0       0.00      0.00      0.00         1
         4.0       0.00      0.00      0.00         1
         5.0       0.00      0.00      0.00        17
         8.0       0.00      0.00      0.00         4
        10.0       0.47      0.09      0.15       163
        11.0       0.00      0.00      0.00         3
        12.0       0.00      0.00      0.00         2
        13.0       0.00      0.00      0.00         1
        14.0       0.00      0.00      0.00         1
        15.0       0.58      1.00      0.73        26
        18.0       0.00      0.00      0.00         4
        19.0       0.00      0.00      0.00         1
        20.0       0.36      0.15      0.21       213
        21.0       0.00      0.00      0.00         1
        22.0       0.00      0.00      0.00         2
        23.0       0.00      0.00      0.00         1
        25.0       0.00      0.00      0.00         1
        27.0       0.00    

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


## Models Value

Given that my model is predictive and not categorical, it does not make sense to use a confustion matrix to help compare the three models. However the most important aspect of my models is their accuracy. The three models performed:


Logistic Regression: 21% Accuracy w/ Weighted Average of 10%

Random Forest: 30% Accuracy w/ Weighted Average of 26%

Gradient Boosting: 28% Accuracy w/ Weighted Average of 25%


Given the above data we would select the Random Forest model. However this may change once I'm able to figure out my datetime issues and include that data in my input sample.