In [None]:
Q1. What is the mathematical formula for a linear SVM?

The mathematical formula for a linear Support Vector Machine (SVM) can be expressed as follows:

Given a training dataset consisting of \(n\) samples with \(m\) features, represented as \((\mathbf{x}_1, y_1), (\mathbf{x}_2, y_2), \ldots, (\mathbf{x}_n, y_n)\), where \(\mathbf{x}_i\) is the feature vector of the \(i\)th sample, and \(y_i\) is its corresponding class label (-1 for the negative class and +1 for the positive class), the linear SVM aims to find the optimal hyperplane that separates the two classes with the maximum margin.

The decision function of a linear SVM is defined as:

\[ f(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b \]

Where:
- \(\mathbf{x}\) is the input feature vector.
- \(\mathbf{w}\) is the weight vector perpendicular to the hyperplane.
- \(b\) is the bias term.

The classification rule is then determined by the sign of the decision function:

\[ \hat{y} = \text{sign}(f(\mathbf{x})) \]

The optimization problem associated with linear SVM can be formulated as:

\[ \min_{\mathbf{w}, b} \frac{1}{2} \| \mathbf{w} \|^2 \]

Subject to the constraints:

\[ y_i(\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1, \text{ for } i = 1, 2, \ldots, n \]

This is the formulation for the hard-margin linear SVM. In practice, soft-margin SVM is often used, which allows for some misclassification by introducing slack variables and adjusting the cost parameter \(C\).

The optimization problem for soft-margin SVM becomes:

\[ \min_{\mathbf{w}, b} \frac{1}{2} \| \mathbf{w} \|^2 + C \sum_{i=1}^{n} \xi_i \]

Subject to the constraints:

\[ y_i(\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 - \xi_i, \text{ for } i = 1, 2, \ldots, n \]
\[ \xi_i \geq 0, \text{ for } i = 1, 2, \ldots, n \]

Here, \(C\) is the regularization parameter that controls the trade-off between maximizing the margin and minimizing the classification error. Larger values of \(C\) lead to a narrower margin but fewer misclassifications, while smaller values of \(C\) result in a wider margin but potentially more misclassifications.

In [None]:
Q2. What is the objective function of a linear SVM?

The objective function of a linear Support Vector Machine (SVM) is a mathematical expression that the SVM aims to minimize during the training process. The objective function is typically formulated to find the hyperplane that maximizes the margin between the classes while minimizing the classification error.

For a linear SVM, the objective function is defined as:

\[ \min_{\mathbf{w}, b} \frac{1}{2} \| \mathbf{w} \|^2 \]

This objective function aims to find the weight vector \(\mathbf{w}\) and bias term \(b\) that minimize the norm of the weight vector, subject to certain constraints. The norm of the weight vector represents the margin, and minimizing it leads to maximizing the margin between the classes.

The constraints imposed on the optimization problem ensure that the training samples are correctly classified or lie within a certain margin from the decision boundary. These constraints are typically expressed as:

\[ y_i(\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 \]

where \(y_i\) is the class label of the \(i\)th training sample, \(\mathbf{x}_i\) is its feature vector, and \(\mathbf{w} \cdot \mathbf{x}_i + b\) represents the decision function. This constraint ensures that each sample is on the correct side of the decision boundary and lies outside a certain margin from it.

In the case of soft-margin SVM, which allows for some misclassification, the objective function is modified to include a term that penalizes misclassifications:

\[ \min_{\mathbf{w}, b} \frac{1}{2} \| \mathbf{w} \|^2 + C \sum_{i=1}^{n} \xi_i \]

where \(C\) is the regularization parameter that controls the trade-off between maximizing the margin and minimizing the classification error, and \(\xi_i\) are slack variables that represent the extent of misclassification for each sample.

Overall, the objective function of a linear SVM aims to find the hyperplane that maximizes the margin between the classes while ensuring that the training samples are correctly classified or lie within a certain margin from the decision boundary.

In [None]:
Q3. What is the kernel trick in SVM?

The kernel trick is a technique used in Support Vector Machines (SVMs) to implicitly map input data into a higher-dimensional feature space without explicitly computing the transformed feature vectors. It allows SVMs to efficiently handle non-linearly separable data by transforming it into a higher-dimensional space where it may become linearly separable.

The kernel trick works by introducing a kernel function \( K(\mathbf{x}_i, \mathbf{x}_j) \), which computes the inner product (dot product) of the input feature vectors \(\mathbf{x}_i\) and \(\mathbf{x}_j\) in the transformed feature space. Instead of explicitly transforming the input data into the higher-dimensional space, the kernel function calculates the similarity between pairs of data points directly in the original feature space.

Mathematically, the decision function of an SVM using the kernel trick can be expressed as:

\[ f(\mathbf{x}) = \sum_{i=1}^{n} \alpha_i y_i K(\mathbf{x}_i, \mathbf{x}) + b \]

Where:
- \( \alpha_i \) are the Lagrange multipliers obtained during the optimization process.
- \( y_i \) are the class labels of the training samples.
- \( \mathbf{x}_i \) are the training feature vectors.
- \( \mathbf{x} \) is the input feature vector.
- \( K(\mathbf{x}_i, \mathbf{x}) \) is the kernel function, which computes the inner product (similarity) between \( \mathbf{x}_i \) and \( \mathbf{x} \) in the original feature space.

Commonly used kernel functions include:

1. Linear kernel: \( K(\mathbf{x}_i, \mathbf{x}_j) = \mathbf{x}_i^T \mathbf{x}_j \)
2. Polynomial kernel: \( K(\mathbf{x}_i, \mathbf{x}_j) = (\gamma \mathbf{x}_i^T \mathbf{x}_j + r)^d \)
3. Gaussian (RBF) kernel: \( K(\mathbf{x}_i, \mathbf{x}_j) = \exp \left( - \frac{\| \mathbf{x}_i - \mathbf{x}_j \|^2}{2\sigma^2} \right) \)

Using the kernel trick, SVMs can effectively learn complex decision boundaries in the original feature space without explicitly computing the transformed feature vectors, making them powerful and versatile classifiers for handling non-linearly separable data.

In [None]:
Q4. What is the role of support vectors in SVM Explain with example

In Support Vector Machines (SVMs), support vectors are the data points that lie closest to the decision boundary (hyperplane) and influence the position and orientation of the hyperplane. These points are crucial in determining the optimal hyperplane that maximizes the margin between the classes.

The role of support vectors in SVM can be understood through the following key points:

1. **Defining the Margin**: In SVM, the margin is the distance between the decision boundary and the closest data points from each class. The support vectors are the data points that lie on the margin or within a certain distance from it. They define the margin and are essential for maximizing it.

2. **Determining the Decision Boundary**: The decision boundary of an SVM is determined by the support vectors. The hyperplane is positioned such that it is equidistant from the closest support vectors of each class. This ensures that the decision boundary is optimally positioned to separate the classes while maximizing the margin.

3. **Influence on Model's Parameters**: During the training process, the optimization algorithm of SVM focuses on the support vectors, as they have the most significant impact on the position and orientation of the decision boundary. The model parameters, such as the weight vector and bias term, are computed based on the support vectors.

4. **Robustness and Generalization**: Support vectors represent the most informative data points in the dataset that are critical for the classification task. As a result, the decision boundary of an SVM is robust and less sensitive to outliers or noise in the data. This leads to better generalization performance on unseen data.

Example:
Consider a binary classification problem where we aim to classify points in a two-dimensional feature space into two classes, 'A' and 'B', using an SVM. Let's assume the following points are our training data:

- Class 'A': (-1, -1), (0, 0)
- Class 'B': (1, 1), (2, 2)

In this case, the support vectors would be the points closest to the decision boundary. Here, the support vectors would be (-1, -1) and (1, 1), as they lie on the margin or within a certain distance from it. These support vectors determine the position and orientation of the decision boundary, which optimally separates the two classes. Other data points that are farther away from the decision boundary do not influence the hyperplane and are not considered support vectors.

In [None]:
Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin and Hard margin in
SVM?

Certainly! Let's illustrate the concepts of hyperplane, marginal plane, soft margin, and hard margin in SVM with examples and graphs.

### Example 1: Linearly Separable Data (Hard Margin)

Consider a simple example with two classes, 'A' and 'B', that are linearly separable.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.svm import SVC

# Generate linearly separable data
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_clusters_per_class=1, random_state=42)

# Fit SVM with hard margin
svm_hard = SVC(kernel='linear', C=1e6)
svm_hard.fit(X, y)

# Plot data points and decision boundary
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', marker='o', edgecolors='k', label='Data Points')

# Plot decision boundary
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# Create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = svm_hard.decision_function(xy).reshape(XX.shape)

# Plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
ax.scatter(svm_hard.support_vectors_[:, 0], svm_hard.support_vectors_[:, 1], s=100, linewidth=1, facecolors='none',
           edgecolors='k', label='Support Vectors')

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('SVM with Hard Margin')
plt.legend()
plt.grid(True)
plt.show()

In this example, we use a linear SVM with a hard margin (large \(C\)) to separate the two classes. The decision boundary (hyperplane) is shown as a solid line, and the margin is represented by the dashed lines. The support vectors, which lie on the margin or within a certain distance from it, are denoted by black circles.

### Example 2: Linearly Inseparable Data (Soft Margin)

Now, let's consider an example with data that is not linearly separable.

# Generate linearly inseparable data
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_clusters_per_class=1,
                           flip_y=0.1, class_sep=0.5, random_state=42)

# Fit SVM with soft margin
svm_soft = SVC(kernel='linear', C=0.1)
svm_soft.fit(X, y)

# Plot data points and decision boundary
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', marker='o', edgecolors='k', label='Data Points')

# Plot decision boundary
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# Create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = svm_soft.decision_function(xy).reshape(XX.shape)

# Plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
ax.scatter(svm_soft.support_vectors_[:, 0], svm_soft.support_vectors_[:, 1], s=100, linewidth=1, facecolors='none',
           edgecolors='k', label='Support Vectors')

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('SVM with Soft Margin')
plt.legend()
plt.grid(True)
plt.show()

In this example, we use a linear SVM with a soft margin (small \(C\)) to handle the linearly inseparable data. The decision boundary (hyperplane) adjusts to accommodate misclassified points within the margin, represented by the dashed lines.

These examples illustrate the concepts of hard margin and soft margin in SVM and how the SVM adapts to different scenarios based on the margin parameter \(C\).

In [None]:
Q6. SVM Implementation through Iris dataset.

~ Load the iris dataset from the scikit-learn library and split it into a training set and a testing setl
~ Train a linear SVM classifier on the training set and predict the labels for the testing setl
~ Compute the accuracy of the model on the testing setl
~ Plot the decision boundaries of the trained model using two of the featuresl
~ Try different values of the regularisation parameter C and see how it affects the performance of
the model.

Bonus task: Implement a linear SVM classifier from scratch using Python and compare its
performance with the scikit-learn implementation.


Here's how you can implement SVM on the Iris dataset using scikit-learn:

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from matplotlib.colors import ListedColormap

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Only take the first two features for visualization
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a linear SVM classifier
svm_classifier = SVC(kernel='linear', C=1.0)
svm_classifier.fit(X_train, y_train)

# Predict the labels for the testing set
y_pred = svm_classifier.predict(X_test)

# Compute the accuracy of the model on the testing set
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Plot the decision boundaries of the trained model
def plot_decision_boundary(X, y, classifier, title):
    h = .02  # Step size in the mesh
    cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
    cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])

    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))

    Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.figure()
    plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold, edgecolor='k', s=20)
    plt.title(title)
    plt.xlabel('Sepal length')
    plt.ylabel('Sepal width')
    plt.show()

plot_decision_boundary(X_train, y_train, svm_classifier, title="Decision Boundary of Linear SVM (Training Set)")

# Try different values of the regularization parameter C
for C in [0.1, 1, 10]:
    svm_classifier = SVC(kernel='linear', C=C)
    svm_classifier.fit(X_train, y_train)
    y_pred = svm_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print("Accuracy (C={}):".format(C), accuracy)
    plot_decision_boundary(X_train, y_train, svm_classifier, title="Decision Boundary of Linear SVM (C={})".format(C))

This code first loads the Iris dataset, splits it into training and testing sets, trains a linear SVM classifier on the training set, and predicts the labels for the testing set. It then computes the accuracy of the model on the testing set and plots the decision boundaries of the trained model using the first two features. Finally, it tries different values of the regularization parameter \( C \) and observes how it affects the performance of the model.