Q1. What is the mathematical formula for a linear SVM?
The mathematical formula for a linear Support Vector Machine (SVM) is given by the decision function:

𝑓
(
𝑥
)
=
𝑤
𝑇
𝑥
+
𝑏
f(x)=w 
T
 x+b

where:

𝑤
w is the weight vector
𝑥
x is the input feature vector
𝑏
b is the bias term
The decision boundary (hyperplane) is defined as the set of points where the decision function is zero:

𝑤
𝑇
𝑥
+
𝑏
=
0
w 
T
 x+b=0

Q2. What is the objective function of a linear SVM?
The objective function of a linear SVM is to find the hyperplane that maximizes the margin between the two classes while minimizing the classification error. The optimization problem can be written as:

Minimize the following primal objective function:

1
2
∥
𝑤
∥
2
+
𝐶
∑
𝑖
=
1
𝑛
𝜉
𝑖
2
1
​
 ∥w∥ 
2
 +C∑ 
i=1
n
​
 ξ 
i
​
 

subject to the constraints:

𝑦
𝑖
(
𝑤
𝑇
𝑥
𝑖
+
𝑏
)
≥
1
−
𝜉
𝑖
,
𝜉
𝑖
≥
0
,
𝑖
=
1
,
…
,
𝑛
y 
i
​
 (w 
T
 x 
i
​
 +b)≥1−ξ 
i
​
 ,ξ 
i
​
 ≥0,i=1,…,n

where:

𝑤
w is the weight vector
𝑏
b is the bias term
𝐶
C is the regularization parameter
𝜉
𝑖
ξ 
i
​
  are the slack variables that allow for misclassifications
𝑦
𝑖
y 
i
​
  are the class labels (
±
1
±1)
𝑥
𝑖
x 
i
​
  are the input feature vectors

Q3. What is the kernel trick in SVM?
The kernel trick allows SVMs to perform classification in higher-dimensional feature spaces without explicitly computing the coordinates in that space. Instead of mapping the data to a high-dimensional space, the kernel trick computes the inner products of the data in that space using a kernel function 
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
K(x 
i
​
 ,x 
j
​
 ).

Common kernel functions include:

Linear kernel: 
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
𝑥
𝑖
𝑇
𝑥
𝑗
K(x 
i
​
 ,x 
j
​
 )=x 
i
T
​
 x 
j
​
 
Polynomial kernel: 
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
(
𝑥
𝑖
𝑇
𝑥
𝑗
+
1
)
𝑑
K(x 
i
​
 ,x 
j
​
 )=(x 
i
T
​
 x 
j
​
 +1) 
d
 
Radial Basis Function (RBF) kernel: 
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
exp
⁡
(
−
𝛾
∥
𝑥
𝑖
−
𝑥
𝑗
∥
2
)
K(x 
i
​
 ,x 
j
​
 )=exp(−γ∥x 
i
​
 −x 
j
​
 ∥ 
2
 )

Q4. What is the role of support vectors in SVM? Explain with an example.
Support vectors are the data points that lie closest to the decision boundary (hyperplane) and are critical in defining the position and orientation of the hyperplane. These points are the most difficult to classify and directly influence the optimal hyperplane. In essence, support vectors are the points that determine the margin between the classes.

Example:
Consider a binary classification problem where we have two classes (red and blue). The SVM algorithm will find the hyperplane that maximizes the margin between the two classes. The support vectors are the data points that lie on the edge of this margin.


In the above example, the support vectors are marked with circles. These points are crucial because if any of these points were removed, the position of the hyperplane would change.

Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin, and Hard margin in SVM.
Hyperplane: The decision boundary that separates the classes. For a linear SVM, it is a straight line in 2D or a flat plane in higher dimensions.

Marginal plane: The planes that are parallel to the hyperplane and pass through the support vectors. The distance between these planes is the margin.

Hard margin: SVM with no tolerance for misclassification. It requires that all data points are correctly classified and lie outside the margin. This is feasible only if the data is linearly separable.

Soft margin: SVM that allows some misclassification to achieve a better overall model, especially when the data is not linearly separable. It introduces slack variables 
𝜉
𝑖
ξ 
i
​
  to allow some points to lie inside the margin or be misclassified.

# Q6. SVM Implementation through Iris dataset

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from mlxtend.plotting import plot_decision_regions

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Using only the first two features for visualization purposes
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear SVM classifier
svm_clf = SVC(kernel='linear', C=1.0)
svm_clf.fit(X_train, y_train)

# Predict the labels for the testing set
y_pred = svm_clf.predict(X_test)

# Compute the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Plot the decision boundaries
plt.figure(figsize=(10, 6))
plot_decision_regions(X_train, y_train, clf=svm_clf, legend=2)
plt.title('SVM Decision Boundary')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

# Confusion Matrix and Classification Report
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))
print('\nClassification Report:')
print(classification_report(y_test, y_pred))


Implement a linear SVM classifier from scratch using Python

In [None]:
# Import necessary libraries
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data[:, :2]  # Using only the first two features for visualization purposes
y = iris.target

# Convert labels to binary (-1 and 1)
y = np.where(y == 0, -1, 1)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

class LinearSVM:
    def __init__(self, learning_rate=0.001, lambda_param=0.01, n_iters=1000):
        self.learning_rate = learning_rate
        self.lambda_param = lambda_param
        self.n_iters = n_iters
        self.w = None
        self.b = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.w = np.zeros(n_features)
        self.b = 0

        for _ in range(self.n_iters):
            for idx, x_i in enumerate(X):
                condition = y[idx] * (np.dot(x_i, self.w) - self.b) >= 1
                if condition:
                    self.w -= self.learning_rate * (2 * self.lambda_param * self.w)
                else:
                    self.w -= self.learning_rate * (2 * self.lambda_param * self.w - np.dot(x_i, y[idx]))
                    self.b -= self.learning_rate * y[idx]

    def predict(self, X):
        approx = np.dot(X, self.w) - self.b
        return np.sign(approx)

# Train the SVM from scratch
svm_scratch = LinearSVM()
svm_scratch.fit(X_train, y_train)

# Predict the labels for the testing set
y_pred_scratch = svm_scratch.predict(X_test)

# Compute the accuracy of the model
accuracy_scratch = accuracy_score(y_test, y_pred_scratch)
print(f'Accuracy (Scratch): {accuracy_scratch:.2f}')

# Comparison with scikit-learn implementation
from sklearn.svm import SVC

# Train a linear SVM classifier using scikit-learn
svm_sklearn = SVC(kernel='linear', C=1.0)
svm_sklearn.fit(X_train, y_train)

# Predict the labels for the testing set using scikit-learn
y_pred_sklearn = svm_sklearn.predict(X_test)

# Compute the accuracy of the model using scikit-learn
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
print(f'Accuracy (scikit-learn): {accuracy_sklearn:.2f}')
