In [None]:
Q1. What is the Mathematical Formula for a Linear SVM?
A Support Vector Machine (SVM) aims to find the optimal hyperplane that separates classes in the feature space. For a linear SVM, the decision function can be represented as:
f(x)=wTx+bf(x) = \mathbf{w}^T \mathbf{x} + bf(x)=wTx+b
where:
•	w\mathbf{w}w is the weight vector (normal to the hyperplane),
•	x\mathbf{x}x is the feature vector,
•	bbb is the bias term.
The hyperplane is defined as:
wTx+b=0\mathbf{w}^T \mathbf{x} + b = 0wTx+b=0
Q2. What is the Objective Function of a Linear SVM?
The objective function of a linear SVM is to maximize the margin between the classes while minimizing the classification error. The margin is defined as the distance between the hyperplane and the nearest data points from either class (the support vectors). The optimization problem can be formulated as:
min⁡w,b12∥w∥2\min_{\mathbf{w}, b} \frac{1}{2} \| \mathbf{w} \|^2minw,b21∥w∥2
subject to:
yi(wTxi+b)≥1y_i (\mathbf{w}^T \mathbf{x_i} + b) \geq 1yi(wTxi+b)≥1
where yiy_iyi are the class labels and xi\mathbf{x_i}xi are the feature vectors. This formulation is known as the hard-margin SVM. For soft-margin SVMs, slack variables ξi\xi_iξi are introduced to handle non-linearly separable data, and the objective function becomes:
min⁡w,b,ξ12∥w∥2+C∑i=1Nξi\min_{\mathbf{w}, b, \xi} \frac{1}{2} \| \mathbf{w} \|^2 + C \sum_{i=1}^{N} \xi_iminw,b,ξ21∥w∥2+C∑i=1Nξi
where CCC is the regularization parameter that controls the trade-off between maximizing the margin and minimizing the classification error.
Q3. What is the Kernel Trick in SVM?
The kernel trick is a method used in SVMs to handle non-linearly separable data by mapping the input features into a higher-dimensional space where a linear separation is possible. Instead of explicitly computing the coordinates in the higher-dimensional space, the kernel trick uses a kernel function K(xi,xj)K(\mathbf{x}_i, \mathbf{x}_j)K(xi,xj) to compute the inner product of data points in this space.
Common kernel functions include:
•	Linear Kernel: K(xi,xj)=xiTxjK(\mathbf{x}_i, \mathbf{x}_j) = \mathbf{x}_i^T \mathbf{x}_jK(xi,xj)=xiTxj
•	Polynomial Kernel: K(xi,xj)=(αxiTxj+c)dK(\mathbf{x}_i, \mathbf{x}_j) = (\alpha \mathbf{x}_i^T \mathbf{x}_j + c)^dK(xi,xj)=(αxiTxj+c)d
•	Radial Basis Function (RBF) Kernel: K(xi,xj)=exp⁡(−γ∥xi−xj∥2)K(\mathbf{x}_i, \mathbf{x}_j) = \exp(-\gamma \| \mathbf{x}_i - \mathbf{x}_j \|^2)K(xi,xj)=exp(−γ∥xi−xj∥2)
Q4. What is the Role of Support Vectors in SVM? Explain with Example
Support vectors are the data points that lie closest to the decision boundary (hyperplane) and are crucial in defining the position and orientation of the hyperplane. They are the most informative data points for the model, as they lie on the margin boundaries.
Example: Consider a 2D dataset where you want to classify data points into two classes. The support vectors are the data points that are on the edge of the margin on either side of the hyperplane. These points are used to maximize the margin between the two classes. The hyperplane is determined based on these support vectors, and removing or altering them can change the position of the hyperplane.
Q5. Illustrate with Examples and Graphs of Hyperplane, Marginal Plane, Soft Margin, and Hard Margin in SVM
Hard Margin SVM:
•	The hyperplane is positioned such that the margin between the classes is maximized, and there are no misclassified points.
•	The constraints are strictly satisfied.
Soft Margin SVM:
•	Allows some misclassification (slack variables) to handle non-linearly separable data.
•	A balance is struck between maximizing the margin and minimizing misclassification errors.
Visual Example:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Use only the first two features for 2D visualization
y = iris.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a hard-margin SVM
clf_hard = SVC(C=np.inf, kernel='linear')
clf_hard.fit(X_train, y_train)

# Train a soft-margin SVM
clf_soft = SVC(C=1, kernel='linear')
clf_soft.fit(X_train, y_train)

# Plot decision boundaries
def plot_decision_boundary(clf, X, y, title):
    h = .02  # Step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')
    plt.title(title)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.show()

plot_decision_boundary(clf_hard, X_train, y_train, 'Hard Margin SVM')
plot_decision_boundary(clf_soft, X_train, y_train, 'Soft Margin SVM')
Q6. SVM Implementation through Iris Dataset
Implementation Using Scikit-Learn:
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load and prepare the dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a Linear SVM
svm = SVC(kernel='linear', C=1)
svm.fit(X_train, y_train)

# Predict on the test set
y_pred = svm.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
Bonus Task: Implement a Linear SVM Classifier from Scratch
From Scratch Implementation:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

class LinearSVM:
    def __init__(self, C=1.0, lr=0.01, epochs=1000):
        self.C = C
        self.lr = lr
        self.epochs = epochs
        self.w = None
        self.b = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.w = np.zeros(n_features)
        self.b = 0

        y = np.where(y == 0, -1, 1)  # Convert 0, 1 labels to -1, 1

        for _ in range(self.epochs):
            for i in range(n_samples):
                if y[i] * (np.dot(X[i], self.w) + self.b) < 1:
                    self.w -= self.lr * (2 * self.C * self.w - np.dot(X[i], y[i]))
                    self.b -= self.lr * y[i]
                else:
                    self.w -= self.lr * (2 * self.C * self.w)

    def predict(self, X):
        return np.sign(np.dot(X, self.w) + self.b)

# Load the dataset
iris = load_iris()
X = iris.data[:, :2]  # Using only the first two features
y = iris.target

# Binary classification for simplicity (class 0 vs. class 1)
y = np.where(y == 2, 1, 0)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train and evaluate the LinearSVM model
model = LinearSVM(C=1.0, lr=0.01, epochs=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Linear SVM Accuracy (from scratch): {accuracy:.2f}")

