#__Applying Support Vector Machine__

Let's examine how to construct a support vector machine.

## Step 1: Import Required Libraries and Load the Dataset

- Install and import required libraries: NumPy, pandas, Seaborn, matplotlib, and scikit-learn
- Load the heart.csv dataset


In [None]:
pip install --upgrade scikit-learn

In [None]:
import numpy as np 
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split 
from sklearn import svm
from sklearn import metrics
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import ConfusionMatrixDisplay

In [None]:
df = pd.read_csv("heart.csv")

## Step 2: Explore and visualize the dataset

- Display dataset information and summary statistics
- Create a scatter plot of age and cholesterol
- Check for missing values
-  Describe and see the basic statistic of the given features


In [None]:
df.head()

__Observation__
- Here, we can see a few rows of the dataset.

In [None]:
df.info()

__Observations__
- As you can see here, we have 303 observations and 14 features.
- All the features have a numeric data type except for ChestPain and Thal,
and there are no missing values.

Let’s describe and see the basic statistics of these features.

In [None]:
df.describe()

__Observations__
- The average age is 54, and the standard deviation is 9.
- The average cholesterol is 246.

- Let’s plot age and cholesterol.
- Let's see if there is any relationship between age and cholesterol.

In [None]:
df.plot(kind='scatter', x='Age', y='Chol', alpha=0.5, c='Chol', cmap='Reds')
plt.xlabel('Age')
plt.ylabel('Cholesterol')
plt.title('Age-Cholesterol Plot')

__Observations__
- Clearly, we can see if the age increases, cholesterol is also high.

In [None]:
df.isna().sum()

## Step 3: Preprocess the Dataset

- Create dummy variables for categorical features
- Separate feature and target matrices
- Split the dataset into training and testing sets


In [None]:
df_new = pd.get_dummies(df, columns= ['ChestPain', 'Thal'],drop_first= True)

Let's create x and y.

In [None]:
x = df_new.drop('AHD',axis = 1) 
y = df_new.AHD

Let's split the data into train and test.

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,random_state=100)

## Step 4: Perform Hyperparameter Tuning and Fit the SVM Model
- Let’s import the GridSearchCV from the model selection.
- Create an SVM classifier
- Define the parameter grid for tuning
- Use GridSearchCV for hyperparameter tuning
- The values inserted into param_grid are SVM model hyperparameter values such as C, Gamma, and kernel.
- Fit the model to the training data


In [None]:
from sklearn.model_selection import GridSearchCV

ml = svm.SVC() 
  
param_grid = {'C': [ 1, 10, 100, 1000,10000], 
              'gamma': [1,0.1,0.01,0.001,0.0001],
              'kernel': ['rbf']} 
  
grid = GridSearchCV(ml, param_grid, refit = True, verbose = 1, cv=5, n_jobs=-1)
  
grid_search=grid.fit(x_train, y_train)

In [None]:
print(grid_search.best_params_)

__Observation__
- Based on the grid search, the best parameters are C at 10, gamma at 0.001, and kernel at rbf.

## Step 5: Evaluate the Model

- Calculate the accuracy of the training data
- Predict the target variable for the test data
- Calculate the accuracy of the test data
- Display the confusion matrix and classification report

In [None]:
accuracy = grid_search.best_score_ 


In [None]:
accuracy

__Observation__
- The accuracy is 75%.

In [None]:
y_test_hat = grid.predict(x_test)

Let's check the confusion matrix for the test case.

In [None]:
confusion_mat = confusion_matrix(y_test, y_test_hat)
disp = ConfusionMatrixDisplay(confusion_matrix=confusion_mat, display_labels=grid.classes_)
disp.plot(cmap=plt.cm.Blues, ax=plt.gca())
plt.show()

__Observations__
- There are 31 cases where the model has misclassified as **No** instead of **Yes**.
- 23 cases are predicted as **Yes** instead of **No**.

Let's check the classification report.

In [None]:
print(classification_report(y_test,y_test_hat))

__Observation__
- We can see from the result that the accuracy is 0.89, while the precision and recall are 0.88 and 0.85, respectively.

In [None]:
model = grid.best_estimator_

In [None]:
len(model.support_vectors_)

In [None]:
model.support_vectors_