## **Project Overview**

The targeted goal of this analysis is to predict departure delays based on various features. 
This will provide insights on how likely your next flight would be delayed based on if certain conditions are being met, such as the weather or the time of your travel to name a few. 

**Target:**

-  Delay (over 15 minutes)

**Features:**
- Month
- Age of Departing Aircraft
- Departure Block (time of day)
- Carrier Name
- Max Temp
- Wind Speed
- Snowfall
- Precipitation
- Departing Airport
- Airport Flights


**Models:**
- Linear Regression
- Support vector machine (SVM)

**Dataset used:**
[2019 Airline Delays w/ Weather and Airport Detail](https://www.kaggle.com/threnjen/2019-airline-delays-and-cancellations)

**SVM Explanation:**

A Support Vector Machine model is similar to a logistic regression model in that is also a binary classifier, it splits the sample in to two categories; in our data set the two categories being Delayed or Not Delayed. While SVM seeks to split our dataset into two categories, it does not do so as rigorously as a linear regression model. SMV allows for "soft" margins and has logic implemented which accounts for outliers and may make exceptions for them. This means some data points past the "cut off" line/point may still be sorted into the opposite classification.  








In [2]:
# SMV Machine Learning Mockup

# Note: Work is mainly conducted on Google Colab, the file uploaded to Github is for instructor/public access. 

# Importing Dependencies 
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
from pathlib import Path
from collections import Counter

from sklearn.metrics import balanced_accuracy_score
from sklearn.metrics import confusion_matrix
from imblearn.metrics import classification_report_imbalanced
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score


In [None]:
# Loading in the data
# Note depending on how the DB team handles the Data & table structure some of the code in the following two blocks may change. 
data = Path('TO_BE_DECIDED')
df = pd.read_csv(data)
df.head()

In [None]:
# Segmenting the features from the target
y = df["DEP_DEL15"]
X = df.drop(columns="DEP_DEL15")

In [None]:
# Utilizing train_test_split function to create training and testing subsets
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    random_state=1, 
                                                    stratify=y)
X_train.shape

In [None]:
# Instantiate a linear SVM model
model = SVC(kernel='linear')

In [None]:
# Fiting the data
model.fit(X_train, y_train)

The following section will score the model utilizing test data and then go on to make a prediction. 


In [None]:
# Making predictions using the test data
y_pred = model.predict(X_test)
results = pd.DataFrame({
    "Prediction": y_pred, 
    "Actual": y_test
}).reset_index(drop=True)
results.head()

In [None]:
# Generating an accuracy score 
accuracy_score(y_test, y_pred)


In [None]:
# Generating a Confusion Matrix 
confusion_matrix(y_test, y_pred)

In [None]:
# Generating a Classification Report
print(classification_report(y_test, y_pred))
