# Support Vector Machines (SVM)  with scikit-learn
***
![SVM visualization](svm.jpg)ew

Support Vector Machines (SVMs) are a powers of supervised learning algorithms used for classification and regression tasks. They work by finding the optimal hyperplane that best separates the data into different classHerement, we'll explore the theory behind SVMs and provide a step-by-step guide on implementing them using `scikit-learn` on the Iris datand SVM

### What is an SVM?

An SVM is a type of supervised machine learning algorithm used primarily for classification tasks. It constructs a hyperplane in a high-dimensional space that separates different classes of data points. The key idea is to find the hyperplane that maximizes the margin between the clasSome terms:Key Concepts

1. **Hyperplane**: A decision boundary that separates different classes. In a 2D space, this is a line; in 3D space, it's a plane, and in higher dimensions, it'(for n-dimension, it's a (n-1)-dimension). a hyperplane.

2. **Margin**: The distance between the hyperplane and the closest data points from each class. SVM aims to maximize this margin.

3. **Support Vectors**: The data points that lie closest to the hyperplane. These are crucial in defining the position and orientation of the hyperplane.

4. **Kernel Trick**: A technique used to transform data into a higher-dimensional space to make it easier to find a hyperplane. Common kernels include linear, polynomial, and radial basis function (RBF).

5. **Soft Margin**: In practice, data may not be linearly separable. The soft margin approach allows for some misclassification to achieve a b## Methematics behind SVMssification_report,

The Mathematics behind Support Vector Machines (SVMs):
### 1. **Linear SVM**

**Objective**: Maximize the margin between two classes.

**Hyperplane Equation**:
$ \mathbf{w} \cdot \mathbf{x} + b = 0 $

**Margin**:
$ \text{Margin} = \frac{2}{\|\mathbf{w}\|} $

**Optimization Problem**:
Minimize:
$ \frac{1}{2} \|\mathbf{w}\|^2 $

Subject to:
$ y_i (\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 $
for all \( i \), where \( y_i \in \{-1, +1\} \).

### 2. **Non-Linear SVM with Kernels**

**Kernel Function**: Maps data to a higher-dimensional space.

Common Kernels:
- **Polynomial Kernel**:
  $ K(\mathbf{x}_i, \mathbf{x}_j) = (\mathbf{x}_i \cdot \mathbf{x}_j + c)^d $
  
- **RBF Kernel**:
  $ K(\mathbf{x}_i, \mathbf{x}_j) = \exp \left( -\frac{\|\mathbf{x}_i - \mathbf{x}_j\|^2}{2\sigma^2} \right) $
  
- **Sigmoid Kernel**:
  $ K(\mathbf{x}_i, \mathbf{x}_j) = \tanh(\alpha \mathbf{x}_i \cdot \mathbf{x}_j + c) $

**Dual Formulation**: Solve for Lagrange multipliers \( \alpha_i \):

Maximize:
$ \sum_{i=1}^N \alpha_i - \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j K(\mathbf{x}_i, \mathbf{x}_j) $

Subject to:
$ 0 \leq \alpha_i \leq C $
$ \sum_{i=1}^N \alpha_i y_i = 0 $

### 3. **Decision Function**

**Prediction**: For a new data point $ \mathbf{x} $:

$ f(\mathbf{x}) = \sum_{i=1}^N \alpha_i y_i K(\mathbf{x}, \mathbf{x}_i) + b $

Here:
- Support vectors are the data points where $ \alpha_i \neq 0 $.
- $ b $ is the bias term.

This provides the SVM model with the ability to make predictions based on both linear and non-linear decision boundaries.ed on both linear and non-linear decision boundaries. confusion_matrix


## Implementing SVM with scikit-learn

### 1: Import Libraries

First, we need to import the necessary libraries and modules.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

### 2: Load and preprocess dataset

In [2]:
iris = datasets.load_iris()
X = iris.data
y = iris.target
cname=iris.target_names

In [3]:
scaler = MinMaxScaler(feature_range=(0,1))

In [4]:
X = scaler.fit_transform(X)

In [5]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=22, shuffle=True)


### 3: Model building and training

In [6]:
# kernel  can any one of ::'linear', 'poly', 'rbf', 'sigmoid'
model = SVC(kernel='linear', C=1.0) 

model.fit(X_train, y_train)


### 4: Evaluation

In [7]:

y_pred = model.predict(X_test)


In [8]:

conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Confusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)


Confusion Matrix:
 [[ 6  0  0]
 [ 0 10  0]
 [ 0  2 12]]

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00         6
           1       0.83      1.00      0.91        10
           2       1.00      0.86      0.92        14

    accuracy                           0.93        30
   macro avg       0.94      0.95      0.94        30
weighted avg       0.94      0.93      0.93        30



### 5: Prediction

In [9]:
def predict_single_data(input_data)->int:
    """
    Predict the class for a single data sample.

    Parameters:
    - input_data (array-like): A list or array of feature values for the sample.

    Returns:
    - prediction (int): The predicted class label.
    """
    input_data = np.array(input_data).reshape(1, -1)
    input_data_scaled = scaler.transform(input_data)
    prediction = model.predict(input_data_scaled)
    
    return prediction[0]

In [10]:
x = [5.8,4.5,4.2,3.5]

In [11]:
p = predict_single_data(x)

In [12]:
p

2

In [13]:
cname[p]

'virginica'