# Support Vector Machines

Support Vector Machines is a supervised Machine Learning algorithm using a linear model for classification problems. The algorithm is use to create a line or a hyperplane which  separates the data into classes and as a result classifies the data points.

Introduction

We have a dataset that contains description of flags that will predict the religion of that flag's country. We have image of different flags that looks similar to the dataset flags. The K-Nearest Neighbors model will find the similar features of the new data set to fit the flags description and based on the similar features it will put it into a religion: Christian, Muslim, Buddhist, Hindu and other.

In [None]:
#Imports
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np
import os

In [None]:
#Loading Dataset
flag = pd.read_csv("flags_with_headers_v_5.csv")
flag.head()

In [None]:
flag = pd.get_dummies(flag)
flag.head()

In [None]:
# Assign X (data) and y (target)
# Drop unnecessary column: 'Unnamed' from the input set.
X = flag.drop(columns=['religion', 'Unnamed: 0'], axis=1)
y = flag.religion

# Scaling the Data

In [None]:
#Split the training and testing data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=35)

In [None]:
#Create a StandardScaler model and fit it to the training data
from sklearn.preprocessing import StandardScaler
X_scaler = StandardScaler().fit(X_train)

In [None]:
#Transform the training and testing data using the X_scaler and y_scaler models
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

# Testing the Support Vector Machine Classifier

In [None]:
#Support vector machine linear classifier
from sklearn.svm import LinearSVC
model = LinearSVC(random_state=0, tol=1e-5)
model.fit(X_train_scaled, y_train.ravel()) 

In [None]:
#Model Accuracy
print('Test Acc: %.3f' % model.score(X_test_scaled, y_test))

In [None]:
#Calculate classification report
from sklearn.metrics import classification_report
predictions = model.predict(X_test_scaled)
print(classification_report(y_test, predictions, target_names =['Christian', 'Muslim', 'Buddhist', 'Hindu', 'Other']))

# Classification Report Interpretation

The precision indicates the proportion of positive identifications which were actually correct. 1.0 is a model that produces no false positives. Per the Support Vector Machines result for classifying Christianity, our precision is .65, fairly average indicating that was some false positives in our classification.

The recall indicates the proportion of actual positives which were correctly classify. A model which produces no false negatives has a number of 1.0. Per the Support Vector Machines result for classifying Christianity, our recall is at .89 which mean the model classified the positives correctly, close to perfect. 

The F-1 score is a combination of precision and recall. A perfect model also consists of 1.0. Per the Support Vector Machines result for classifying Christianity, our recall is .75 which is fairly high. 

The support is the number of samples each metric was calculated on. Per the Support Vector Machines result for classifying Christianity, the accuracy metrics was calculated on 49 samples whereas each religion: Christian was calculated with 27 samples, Muslim was 9 samples, Buddhist was 3 samples, Hindu was 1 sample and Other was 9 samples. Each of these samples added together, makes up 49 samples for the accuracy metrics. 

The accuracy of the model confirms how accurate the model is. Perfect accuracy is 1.0. Per the Support Vector Machines result, the accuracy is .69 which indicate the model is classifying the data half correct.

The Macro Average process the average of precision, recall and F1 score betweem classes. This metric performs overall across the sets of data regardless of any imbalances. This metric is a useful measure when the dataset varies in size.

The Weighted Average process the average of precision, recall and F1 score. Each metric is calculated with the consideration of the sample sizes. An example is they will give a high number when one religion outperforms another due to having more samples.  

# Prediction Example

In [None]:
model.predict(X_test_scaled)

In [None]:
y_test.values

# Analysis

Per the Support Vector Machines model, we have an accuracy of 59% of classifying flag colors, shapes, images, and text to predict the country's religion. Per the classification report, classifying the religion Christian was secure as the precision method (percentage of the predictions were correct) is at 65%, recall method (percentage of the positive cases classified correctly) is at 89%, and f1-score method (percentage of positive predictions were correct) is at 75% but that has to do with the support method (sample size) of 27 vs the other religion in single digits.

In conclusion, per the support method of the religion data points being imbalanced, the macro average confirmed our overall results. Although the Support Vector Machines method has an accuracy of 59% in classifying the religion, due to our imbalance data set, our overall precision is 31%, our overall recall is 29% and overall f1-score is 28% with a sample size of 49. In order to improve the overall result, we will need a larger data set or narrowing the religion to 3-4. The percentage is fairly low overall; however, I can conclude that the Support Vector Machines model can classify countries with Christianity as their religion efficiently based on the attributes of the country’s flag. 
