In [None]:
SVM & Naive bayes

Q1.What is a Support Vector Machine (SVM)
Ans.A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It is particularly powerful for binary classification problems.

How SVM Works
Hyperplane: SVM finds the optimal hyperplane that best separates the data into different classes.
Support Vectors: These are the data points closest to the hyperplane, which influence its position and orientation.
Margin Maximization: SVM tries to maximize the margin (distance) between the hyperplane and the nearest data points from each class.
Kernel Trick: If the data is not linearly separable, SVM can transform it into a higher-dimensional space using kernel functions (e.g., linear, polynomial, radial basis function (RBF)).
SVM in Python (Using Scikit-Learn)
python
Copy
Edit
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the SVM model
model = SVC(kernel='linear')  # You can try 'rbf', 'poly', 'sigmoid' as well
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
When to Use SVM
When you have a small to medium-sized dataset.
When the data is not too noisy.
When you need a strong classifier for complex, high-dimensional data.
SVM is widely used in text classification, image recognition, and bioinformatics. Let me know if you want more details!

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the SVM model
model = SVC(kernel='linear')  # You can try 'rbf', 'poly', 'sigmoid' as well
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")


Q2.What is the difference between Hard Margin and Soft Margin SVM4
Ans.The key difference between Hard Margin SVM and Soft Margin SVM is how they handle misclassified points in the dataset.

1. Hard Margin SVM
Used when the data is perfectly linearly separable (i.e., there exists a hyperplane that separates the classes without any errors).
It strictly enforces that no training data points can be misclassified.
Maximizes the margin while ensuring all points are correctly classified.
Problem: It is very sensitive to outliers and may not work well for real-world noisy data.
Mathematical Formulation
The optimization problem for hard margin SVM:

min
⁡
1
2
∣
∣
𝑤
∣
∣
2
min
2
1
​
 ∣∣w∣∣
2

Subject to:

𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
≥
1
,
∀
𝑖
y
i
​
 (w⋅x
i
​
 +b)≥1,∀i
where:

𝑤
w is the weight vector,
𝑏
b is the bias,
𝑦
𝑖
y
i
​
  is the class label (
+
1
+1 or
−
1
−1),
𝑥
𝑖
x
i
​
  is the feature vector.
2. Soft Margin SVM
Used when the data is not perfectly separable.
Introduces a slack variable
𝜉
𝑖
ξ
i
​
  to allow some misclassifications.
Tries to balance maximizing the margin while minimizing classification errors.
Less sensitive to outliers compared to Hard Margin SVM.
Controlled by the regularization parameter C:
High C → Less tolerance for misclassification (tries to fit the data tightly).
Low C → More tolerance for misclassification (allows some margin violations).
Mathematical Formulation
The optimization problem for soft margin SVM:

min
⁡
1
2
∣
∣
𝑤
∣
∣
2
+
𝐶
∑
𝑖
=
1
𝑛
𝜉
𝑖
min
2
1
​
 ∣∣w∣∣
2
 +C
i=1
∑
n
​
 ξ
i
​

Subject to:

𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
≥
1
−
𝜉
𝑖
,
∀
𝑖
y
i
​
 (w⋅x
i
​
 +b)≥1−ξ
i
​
 ,∀i
where:

𝜉
𝑖
ξ
i
​
  represents the slack variables allowing misclassification.
𝐶
C is the regularization parameter controlling the trade-off.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_classification

# Generate a simple dataset
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_clusters_per_class=1, random_state=42)
y = np.where(y == 0, -1, 1)  # Convert to {-1, 1}

# Hard Margin SVM (C is very high)
hard_margin_svm = SVC(kernel='linear', C=1e5)
hard_margin_svm.fit(X, y)

# Soft Margin SVM (C is smaller)
soft_margin_svm = SVC(kernel='linear', C=1.0)
soft_margin_svm.fit(X, y)

# Plotting decision boundaries
def plot_svm(model, title):
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')

    # Create grid
    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 50), np.linspace(ylim[0], ylim[1], 50))
    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot decision bo


Q3.What is the mathematical intuition behind SVM4
Ans.The mathematical intuition behind Support Vector Machines (SVMs) is rooted in maximizing the margin between different classes while minimizing classification errors. Below is a step-by-step breakdown of its core concepts.

1. Finding the Optimal Hyperplane
Given a labeled dataset
(
𝑥
𝑖
,
𝑦
𝑖
)
(x
i
​
 ,y
i
​
 ), where:

𝑥
𝑖
x
i
​
  is the feature vector,
𝑦
𝑖
∈
{
−
1
,
+
1
}
y
i
​
 ∈{−1,+1} is the class label.
The goal of SVM is to find a hyperplane that best separates the classes:

𝑤
⋅
𝑥
+
𝑏
=
0
w⋅x+b=0
where:

𝑤
w is the weight vector (normal to the hyperplane),
𝑏
b is the bias term.
For binary classification, we define two decision boundaries:

𝑤
⋅
𝑥
+
𝑏
=
+
1
(
for positive class
)
w⋅x+b=+1(for positive class)
𝑤
⋅
𝑥
+
𝑏
=
−
1
(
for negative class
)
w⋅x+b=−1(for negative class)
Margin Definition
The margin is the distance between these two boundaries:

Margin
=
2
∣
∣
𝑤
∣
∣
Margin=
∣∣w∣∣
2
​

Our objective is to maximize the margin, ensuring better generalization.

2. Optimization Problem (Hard Margin SVM)
To maximize the margin while ensuring correct classification, we solve:

min
⁡
𝑤
,
𝑏
1
2
∣
∣
𝑤
∣
∣
2
w,b
min
​
  
2
1
​
 ∣∣w∣∣
2

subject to:

𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
≥
1
,
∀
𝑖
y
i
​
 (w⋅x
i
​
 +b)≥1,∀i
This ensures that all data points lie outside or on their respective margins.

Lagrange Multipliers & Dual Form
Using Lagrange multipliers
𝛼
𝑖
α
i
​
 , we transform this into a dual problem:

max
⁡
𝛼
∑
𝑖
=
1
𝑛
𝛼
𝑖
−
1
2
∑
𝑖
=
1
𝑛
∑
𝑗
=
1
𝑛
𝛼
𝑖
𝛼
𝑗
𝑦
𝑖
𝑦
𝑗
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
α
max
​
  
i=1
∑
n
​
 α
i
​
 −
2
1
​
  
i=1
∑
n
​
  
j=1
∑
n
​
 α
i
​
 α
j
​
 y
i
​
 y
j
​
 K(x
i
​
 ,x
j
​
 )
subject to:

∑
𝑖
=
1
𝑛
𝛼
𝑖
𝑦
𝑖
=
0
,
0
≤
𝛼
𝑖
≤
𝐶
i=1
∑
n
​
 α
i
​
 y
i
​
 =0,0≤α
i
​
 ≤C
where
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
𝑥
𝑖
⋅
𝑥
𝑗
K(x
i
​
 ,x
j
​
 )=x
i
​
 ⋅x
j
​
  (for linear SVM) or a kernel function for non-linear SVM.

The solution gives support vectors, which are the only points with
𝛼
𝑖
>
0
α
i
​
 >0, and they define the decision boundary.

3. Soft Margin SVM (Handling Misclassification)
For non-linearly separable data, we introduce slack variables
𝜉
𝑖
ξ
i
​
  to allow some violations:

𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
≥
1
−
𝜉
𝑖
,
𝜉
𝑖
≥
0
y
i
​
 (w⋅x
i
​
 +b)≥1−ξ
i
​
 ,ξ
i
​
 ≥0
The new objective function balances margin maximization with misclassification:

min
⁡
𝑤
,
𝑏
,
𝜉
1
2
∣
∣
𝑤
∣
∣
2
+
𝐶
∑
𝑖
=
1
𝑛
𝜉
𝑖
w,b,ξ
min
​
  
2
1
​
 ∣∣w∣∣
2
 +C
i=1
∑
n
​
 ξ
i
​

where
𝐶
C controls the trade-off between maximizing the margin and allowing misclassifications.

4. Kernel Trick (Handling Non-Linearity)
For complex datasets where a linear separator doesn’t work, we use the kernel trick to map data to a higher-dimensional space. The kernel function
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
K(x
i
​
 ,x
j
​
 ) replaces the dot product:

Linear Kernel:
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
𝑥
𝑖
⋅
𝑥
𝑗
K(x
i
​
 ,x
j
​
 )=x
i
​
 ⋅x
j
​

Polynomial Kernel:
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
(
𝑥
𝑖
⋅
𝑥
𝑗
+
𝑐
)
𝑑
K(x
i
​
 ,x
j
​
 )=(x
i
​
 ⋅x
j
​
 +c)
d

Radial Basis Function (RBF) Kernel:
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
exp
⁡
(
−
𝛾
∣
∣
𝑥
𝑖
−
𝑥
𝑗
∣
∣
2
)
K(x
i
​
 ,x
j
​
 )=exp(−γ∣∣x
i
​
 −x
j
​
 ∣∣
2
 )
This allows SVM to learn non-linear decision boundaries efficiently.

5. Final Decision Function
Once we solve for
𝑤
w and
𝑏
b, predictions are made using:

𝑓
(
𝑥
)
=
sign
(
𝑤
⋅
𝑥
+
𝑏
)
f(x)=sign(w⋅x+b)
For the dual form with kernels:

𝑓
(
𝑥
)
=
sign
(
∑
𝑖
=
1
𝑛
𝛼
𝑖
𝑦
𝑖
𝐾
(
𝑥
𝑖
,
𝑥
)
+
𝑏
)
f(x)=sign(
i=1
∑
n
​
 α
i
​
 y
i
​
 K(x
i
​
 ,x)+b)


Q4.What is the role of Lagrange Multipliers in SVM4
Ans.Role of Lagrange Multipliers in SVM
Lagrange multipliers play a crucial role in Support Vector Machines (SVMs) by transforming the constrained optimization problem into a solvable dual problem. This allows SVM to efficiently find the optimal hyperplane and support vectors.

1. The Primal Optimization Problem (Hard Margin SVM)
The objective of SVM is to find a hyperplane that maximizes the margin while ensuring all data points are correctly classified. The problem is formulated as:

min
⁡
𝑤
,
𝑏
1
2
∣
∣
𝑤
∣
∣
2
w,b
min
​
  
2
1
​
 ∣∣w∣∣
2

subject to the constraints:

𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
≥
1
,
∀
𝑖
y
i
​
 (w⋅x
i
​
 +b)≥1,∀i
where:

𝑤
w is the weight vector (normal to the hyperplane),
𝑏
b is the bias term,
𝑥
𝑖
x
i
​
  is the feature vector,
𝑦
𝑖
y
i
​
  is the class label (
±
1
±1).
2. Introducing Lagrange Multipliers
To solve this constrained optimization problem, we introduce Lagrange multipliers
𝛼
𝑖
α
i
​
  for each constraint. The Lagrangian function is defined as:

𝐿
(
𝑤
,
𝑏
,
𝛼
)
=
1
2
∣
∣
𝑤
∣
∣
2
−
∑
𝑖
=
1
𝑛
𝛼
𝑖
[
𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
−
1
]
L(w,b,α)=
2
1
​
 ∣∣w∣∣
2
 −
i=1
∑
n
​
 α
i
​
 [y
i
​
 (w⋅x
i
​
 +b)−1]
where:

𝛼
𝑖
≥
0
α
i
​
 ≥0 are the Lagrange multipliers,
The term
[
𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
−
1
]
[y
i
​
 (w⋅x
i
​
 +b)−1] ensures the constraints are satisfied.
Karush-Kuhn-Tucker (KKT) Conditions
To find the optimal solution, we differentiate
𝐿
L with respect to
𝑤
w and
𝑏
b, and set the derivatives to zero:

Gradient w.r.t
𝑤
w:

∂
𝐿
∂
𝑤
=
𝑤
−
∑
𝑖
=
1
𝑛
𝛼
𝑖
𝑦
𝑖
𝑥
𝑖
=
0
∂w
∂L
​
 =w−
i=1
∑
n
​
 α
i
​
 y
i
​
 x
i
​
 =0
𝑤
=
∑
𝑖
=
1
𝑛
𝛼
𝑖
𝑦
𝑖
𝑥
𝑖
w=
i=1
∑
n
​
 α
i
​
 y
i
​
 x
i
​

Gradient w.r.t
𝑏
b:

∂
𝐿
∂
𝑏
=
∑
𝑖
=
1
𝑛
𝛼
𝑖
𝑦
𝑖
=
0
∂b
∂L
​
 =
i=1
∑
n
​
 α
i
​
 y
i
​
 =0
Complementary Slackness:

𝛼
𝑖
[
𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
−
1
]
=
0
α
i
​
 [y
i
​
 (w⋅x
i
​
 +b)−1]=0
This means
𝛼
𝑖
α
i
​
  is nonzero only for support vectors (points that lie on the margin boundary).

3. Converting to the Dual Problem
By substituting
𝑤
w into the Lagrangian, we eliminate
𝑤
w and obtain the dual form:

max
⁡
𝛼
∑
𝑖
=
1
𝑛
𝛼
𝑖
−
1
2
∑
𝑖
=
1
𝑛
∑
𝑗
=
1
𝑛
𝛼
𝑖
𝛼
𝑗
𝑦
𝑖
𝑦
𝑗
(
𝑥
𝑖
⋅
𝑥
𝑗
)
α
max
​
  
i=1
∑
n
​
 α
i
​
 −
2
1
​
  
i=1
∑
n
​
  
j=1
∑
n
​
 α
i
​
 α
j
​
 y
i
​
 y
j
​
 (x
i
​
 ⋅x
j
​
 )
subject to:

∑
𝑖
=
1
𝑛
𝛼
𝑖
𝑦
𝑖
=
0
,
𝛼
𝑖
≥
0
i=1
∑
n
​
 α
i
​
 y
i
​
 =0,α
i
​
 ≥0
This quadratic optimization problem is easier to solve than the original primal problem.

4. Support Vectors & Decision Function
Only points with
𝛼
𝑖
>
0
α
i
​
 >0 are support vectors, meaning they define the decision boundary.

The final decision function is:

𝑓
(
𝑥
)
=
sign
(
∑
𝑖
=
1
𝑛
𝛼
𝑖
𝑦
𝑖
(
𝑥
𝑖
⋅
𝑥
)
+
𝑏
)
f(x)=sign(
i=1
∑
n
​
 α
i
​
 y
i
​
 (x
i
​
 ⋅x)+b)
For non-linearly separable data, we replace
𝑥
𝑖
⋅
𝑥
𝑗
x
i
​
 ⋅x
j
​
  with a kernel function
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
K(x
i
​
 ,x
j
​
 ), allowing us to compute decision boundaries in higher-dimensional spaces.

5. Soft Margin SVM (Handling Misclassification)
For noisy data, we introduce slack variables
𝜉
𝑖
ξ
i
​
  and modify the optimization:

min
⁡
𝑤
,
𝑏
,
𝜉
1
2
∣
∣
𝑤
∣
∣
2
+
𝐶
∑
𝑖
=
1
𝑛
𝜉
𝑖
w,b,ξ
min
​
  
2
1
​
 ∣∣w∣∣
2
 +C
i=1
∑
n
​
 ξ
i
​

subject to:

𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
≥
1
−
𝜉
𝑖
,
𝜉
𝑖
≥
0
y
i
​
 (w⋅x
i
​
 +b)≥1−ξ
i
​
 ,ξ
i
​
 ≥0
The dual problem now includes an upper bound on
𝛼
𝑖
α
i
​
 :

0
≤
𝛼
𝑖
≤
𝐶
0≤α
i
​
 ≤C
where C controls the trade-off between margin maximization and misclassification.



Q5.What are Support Vectors in SVM4
Ans.Support Vectors in SVM
Support Vectors are the most important data points in Support Vector Machines (SVMs) because they define the optimal hyperplane that separates different classes.

1. Definition of Support Vectors
In an SVM model, the goal is to find a decision boundary (hyperplane) that maximizes the margin between two classes. The support vectors are the data points that lie closest to this hyperplane. These points determine:

The position of the hyperplane.
The width of the margin.
The robustness of the model.
Mathematically, for a given hyperplane:

𝑤
⋅
𝑥
+
𝑏
=
0
w⋅x+b=0
The support vectors are the points that satisfy the condition:

𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
=
1
y
i
​
 (w⋅x
i
​
 +b)=1
where:

𝑤
w is the weight vector,
𝑥
𝑖
x
i
​
  is a support vector,
𝑦
𝑖
y
i
​
  is the class label (
+
1
+1 or
−
1
−1),
𝑏
b is the bias term.
2. Importance of Support Vectors
They define the margin of the classifier.
Removing any non-support vector has no effect on the decision boundary.
Removing a support vector can change the decision boundary significantly.
3. Finding Support Vectors in Python
Using Scikit-Learn, you can find the support vectors after training an SVM model:

python


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_classification

# Generate synthetic dataset
X, y = make_classification(n_samples=50, n_features=2, n_classes=2, n_clusters_per_class=1, random_state=42)
y = np.where(y == 0, -1, 1)  # Convert labels to {-1, 1}

# Train SVM model
svm = SVC(kernel='linear', C=1.0)
svm.fit(X, y)

# Get support vectors
support_vectors = svm.support_vectors_

# Plot decision boundary and support vectors
def plot_svm(model, X, y, support_vectors):
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
    plt.scatter(support_vectors[:, 0], support_vectors[:, 1], color='gold', s=200, edgecolors='black', label="Support Vectors")

    # Create grid
    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 50), np.linspace(ylim[0], ylim[1], 50))
    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot decision boundary
    ax.contour(xx, yy, Z, levels=[0], linestyles=['solid'])
    plt.legend()
    plt.title("SVM with Support Vectors")
    plt.show()

plot_svm(svm, X, y, support_vectors)


4. Support Vectors in Soft Margin SVM
When data is not perfectly separable, SVM allows some misclassification using slack variables
𝜉
𝑖
ξ
i
​
 :

𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
≥
1
−
𝜉
𝑖
,
𝜉
𝑖
≥
0
y
i
​
 (w⋅x
i
​
 +b)≥1−ξ
i
​
 ,ξ
i
​
 ≥0
Support vectors can either be on the margin or inside the margin.
Some support vectors may be misclassified, especially when
𝐶
C (regularization parameter) is low.
You can identify support vectors in a soft-margin SVM using:

In [None]:
print("Support Vector Indices:", svm.support_)
print("Number of Support Vectors per Class:", svm.n_support_)


Q6.What is a Support Vector Classifier (SVC)4
Ans.Support Vector Classifier (SVC) in SVM
A Support Vector Classifier (SVC) is a type of Support Vector Machine (SVM) used for classification tasks. It finds the best decision boundary (hyperplane) that maximizes the margin between different classes.

1. What is SVC?
SVC is the classification version of SVM.
It aims to separate data points belonging to different classes using a hyperplane.
If data is not linearly separable, SVC uses a soft margin and kernel trick to handle non-linearity.
2. Mathematical Formulation of SVC
For a binary classification problem with dataset
(
𝑥
𝑖
,
𝑦
𝑖
)
(x
i
​
 ,y
i
​
 ), where:

𝑥
𝑖
x
i
​
  is the feature vector.
𝑦
𝑖
∈
{
−
1
,
+
1
}
y
i
​
 ∈{−1,+1} is the class label.
The decision boundary is given by:

𝑤
⋅
𝑥
+
𝑏
=
0
w⋅x+b=0
Hard Margin SVC (for linearly separable data)
Objective function:

min
⁡
𝑤
,
𝑏
1
2
∣
∣
𝑤
∣
∣
2
w,b
min
​
  
2
1
​
 ∣∣w∣∣
2

subject to:

𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
≥
1
,
∀
𝑖
y
i
​
 (w⋅x
i
​
 +b)≥1,∀i
Soft Margin SVC (for non-linearly separable data)
We introduce slack variables
𝜉
𝑖
ξ
i
​
 :

𝑦
𝑖
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
≥
1
−
𝜉
𝑖
,
𝜉
𝑖
≥
0
y
i
​
 (w⋅x
i
​
 +b)≥1−ξ
i
​
 ,ξ
i
​
 ≥0
The objective function balances margin maximization and misclassification:

min
⁡
𝑤
,
𝑏
,
𝜉
1
2
∣
∣
𝑤
∣
∣
2
+
𝐶
∑
𝑖
=
1
𝑛
𝜉
𝑖
w,b,ξ
min
​
  
2
1
​
 ∣∣w∣∣
2
 +C
i=1
∑
n
​
 ξ
i
​

where:

𝐶
C controls the trade-off between margin width and classification accuracy.
3. Kernel Trick in SVC
If data is not linearly separable, SVC uses the kernel trick to project data into a higher-dimensional space where it is separable.

Common Kernels:

Linear Kernel:
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
𝑥
𝑖
⋅
𝑥
𝑗
K(x
i
​
 ,x
j
​
 )=x
i
​
 ⋅x
j
​

Polynomial Kernel:
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
(
𝑥
𝑖
⋅
𝑥
𝑗
+
𝑐
)
𝑑
K(x
i
​
 ,x
j
​
 )=(x
i
​
 ⋅x
j
​
 +c)
d

Radial Basis Function (RBF) Kernel:
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
exp
⁡
(
−
𝛾
∣
∣
𝑥
𝑖
−
𝑥
𝑗
∣
∣
2
)
K(x
i
​
 ,x
j
​
 )=exp(−γ∣∣x
i
​
 −x
j
​
 ∣∣
2
 )


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_classification

# Generate synthetic dataset
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_clusters_per_class=1, random_state=42)
y = np.where(y == 0, -1, 1)  # Convert labels to {-1, 1}

# Train SVC model with a linear kernel
svc = SVC(kernel='linear', C=1.0)
svc.fit(X, y)

# Get support vectors
support_vectors = svc.support_vectors_

# Plot decision boundary and support vectors
def plot_svc(model, X, y, support_vectors):
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
    plt.scatter(support_vectors[:, 0], support_vectors[:, 1], color='gold', s=200, edgecolors='black', label="Support Vectors")

    # Create grid
    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 50), np.linspace(ylim[0], ylim[1], 50))
    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot decision boundary
    ax.contour(xx, yy, Z, levels=[0], linestyles=['solid'])
    plt.legend()
    plt.title("SVC with Support Vectors")
    plt.show()

plot_svc(svc, X, y, support_vectors)


Q7.What is a Support Vector Regressor (SVR)
Ans.Support Vector Regressor (SVR) in SVM
A Support Vector Regressor (SVR) is a type of Support Vector Machine (SVM) used for regression tasks. Unlike Support Vector Classifier (SVC), which finds a decision boundary to separate classes, SVR finds a function that best fits the data while maintaining a margin of tolerance.

1. What is SVR?
SVR is the regression version of SVM.
It finds a function
𝑓
(
𝑥
)
f(x) that predicts
𝑦
y with minimal error.
Instead of maximizing the margin like SVC, SVR fits a function within an
𝜖
ϵ-tube, allowing some flexibility.
Mathematical Formulation of SVR
Given a dataset
(
𝑥
𝑖
,
𝑦
𝑖
)
(x
i
​
 ,y
i
​
 ), SVR aims to find a function:

𝑓
(
𝑥
)
=
𝑤
⋅
𝑥
+
𝑏
f(x)=w⋅x+b
such that most predictions fall within a margin
𝜖
ϵ:

∣
𝑦
𝑖
−
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
∣
≤
𝜖
∣y
i
​
 −(w⋅x
i
​
 +b)∣≤ϵ
2. Hard vs Soft Margin SVR
Hard Margin SVR (Exact Fit)
For perfectly predictable data, SVR minimizes:

min
⁡
𝑤
,
𝑏
1
2
∣
∣
𝑤
∣
∣
2
w,b
min
​
  
2
1
​
 ∣∣w∣∣
2

subject to:

∣
𝑦
𝑖
−
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
∣
≤
𝜖
∣y
i
​
 −(w⋅x
i
​
 +b)∣≤ϵ
where
𝜖
ϵ is a tolerance threshold.

Soft Margin SVR (Handling Noise)
For real-world data with noise, we introduce slack variables
𝜉
𝑖
+
ξ
i
+
​
  and
𝜉
𝑖
−
ξ
i
−
​
  to allow some points to be outside the
𝜖
ϵ-tube:

∣
𝑦
𝑖
−
(
𝑤
⋅
𝑥
𝑖
+
𝑏
)
∣
≤
𝜖
+
𝜉
𝑖
+
,
𝜉
𝑖
+
,
𝜉
𝑖
−
≥
0
∣y
i
​
 −(w⋅x
i
​
 +b)∣≤ϵ+ξ
i
+
​
 ,ξ
i
+
​
 ,ξ
i
−
​
 ≥0
Objective function:

min
⁡
𝑤
,
𝑏
,
𝜉
1
2
∣
∣
𝑤
∣
∣
2
+
𝐶
∑
𝑖
=
1
𝑛
(
𝜉
𝑖
+
+
𝜉
𝑖
−
)
w,b,ξ
min
​
  
2
1
​
 ∣∣w∣∣
2
 +C
i=1
∑
n
​
 (ξ
i
+
​
 +ξ
i
−
​
 )
where:

𝐶
C controls the trade-off between flatness of the function and margin violations.
Higher
𝐶
C → Lower tolerance for outliers.
3. Kernel Trick in SVR
Like SVC, SVR can model non-linear relationships using kernels:

Linear Kernel:
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
𝑥
𝑖
⋅
𝑥
𝑗
K(x
i
​
 ,x
j
​
 )=x
i
​
 ⋅x
j
​

Polynomial Kernel:
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
(
𝑥
𝑖
⋅
𝑥
𝑗
+
𝑐
)
𝑑
K(x
i
​
 ,x
j
​
 )=(x
i
​
 ⋅x
j
​
 +c)
d

Radial Basis Function (RBF) Kernel:
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
exp
⁡
(
−
𝛾
∣
∣
𝑥
𝑖
−
𝑥
𝑗
∣
∣
2
)
K(x
i
​
 ,x
j
​
 )=exp(−γ∣∣x
i
​
 −x
j
​
 ∣∣
2
 )

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR

# Generate synthetic regression data
np.random.seed(42)
X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel() + 0.1 * np.random.randn(40)  # Add some noise

# Train SVR models with different kernels
svr_linear = SVR(kernel='linear', C=100, epsilon=0.1)
svr_rbf = SVR(kernel='rbf', C=100, epsilon=0.1, gamma=0.5)

# Fit models
svr_linear.fit(X, y)
svr_rbf.fit(X, y)

# Predict
X_test = np.linspace(0, 5, 100).reshape(-1, 1)
y_pred_linear = svr_linear.predict(X_test)
y_pred_rbf = svr_rbf.predict(X_test)

# Plot results
plt.scatter(X, y, color='black', label='Data')
plt.plot(X_test, y_pred_linear, color='blue', label='SVR (Linear Kernel)')
plt.plot(X_test, y_pred_rbf, color='red', label='SVR (RBF Kernel)')
plt.legend()
plt.title("Support Vector Regression")
plt.show()


Q8.What is the Kernel Trick in SVM
Ans.Kernel Trick in SVM
The Kernel Trick is a technique in Support Vector Machines (SVMs) that allows them to handle non-linearly separable data by transforming the input space into a higher-dimensional space where a linear separation is possible.

1. Why is the Kernel Trick Needed?
In many real-world scenarios, data is not linearly separable in its original feature space. Consider a dataset like this:

 Class A: (inside a circle)
 Class B: (outside the circle)

A linear SVM cannot separate these classes with a straight line. The Kernel Trick helps by mapping the data to a higher-dimensional space where a hyperplane can be used for separation.

2. How Does the Kernel Trick Work?
Instead of explicitly computing the transformation into a higher-dimensional space (which may be computationally expensive), we use a Kernel Function
𝐾
(
𝑥
,
𝑥
′
)
K(x,x
′
 ) that computes the dot product directly in the higher-dimensional space:

𝐾
(
𝑥
,
𝑥
′
)
=
𝜙
(
𝑥
)
⋅
𝜙
(
𝑥
′
)
K(x,x
′
 )=ϕ(x)⋅ϕ(x
′
 )
where:

𝑥
,
𝑥
′
x,x
′
  are two input feature vectors.
𝜙
(
𝑥
)
ϕ(x) is the mapping function to a higher-dimensional space.
𝐾
(
𝑥
,
𝑥
′
)
K(x,x
′
 ) computes the dot product in that high-dimensional space without explicitly transforming
𝑥
x.
3. Common Kernel Functions
Different kernel functions are used in SVMs, depending on the data:

(a) Linear Kernel (For Linearly Separable Data)
𝐾
(
𝑥
,
𝑥
′
)
=
𝑥
⋅
𝑥
′
K(x,x
′
 )=x⋅x
′

Equivalent to a standard SVM without kernel trick.
Used when data is already linearly separable.

In [None]:
from sklearn.svm import SVC
svm_linear = SVC(kernel='linear')


(b) Polynomial Kernel (For Polynomial Relationships)
𝐾
(
𝑥
,
𝑥
′
)
=
(
𝑥
⋅
𝑥
′
+
𝑐
)
𝑑
K(x,x
′
 )=(x⋅x
′
 +c)
d

Projects data into a higher-degree polynomial space.
Hyperparameters:
𝑐
c (coefficient) controls bias.
𝑑
d (degree) controls complexity.

In [None]:
(b) Polynomial Kernel (For Polynomial Relationships)
𝐾
(
𝑥
,
𝑥
′
)
=
(
𝑥
⋅
𝑥
′
+
𝑐
)
𝑑
K(x,x
′
 )=(x⋅x
′
 +c)
d

Projects data into a higher-degree polynomial space.
Hyperparameters:
𝑐
c (coefficient) controls bias.
𝑑
d (degree) controls complexity.

(c) Radial Basis Function (RBF) Kernel (For Complex Non-Linear Data)
𝐾
(
𝑥
,
𝑥
′
)
=
exp
⁡
(
−
𝛾
∣
∣
𝑥
−
𝑥
′
∣
∣
2
)
K(x,x
′
 )=exp(−γ∣∣x−x
′
 ∣∣
2
 )
Maps data into an infinite-dimensional space.
Hyperparameter:
𝛾
γ controls how much influence each data point has.
🔹 Example in Python:

python

svm_rbf = SVC(kernel='rbf', gamma=0.5)
(d) Sigmoid Kernel (Similar to Neural Networks)
𝐾
(
𝑥
,
𝑥
′
)
=
tanh
⁡
(
𝛼
𝑥
⋅
𝑥
′
+
𝑐
)
K(x,x
′
 )=tanh(αx⋅x
′
 +c)
Mimics the behavior of a neural network activation function.
🔹 Example in Python:

python

svm_sigmoid = SVC(kernel='sigmoid')
4. Visualizing the Effect of Kernels
Here’s a Python example demonstrating different kernels:

python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons

# Generate non-linearly separable data
X, y = make_moons(n_samples=100, noise=0.1, random_state=42)

# Train SVMs with different kernels
svm_linear = SVC(kernel='linear').fit(X, y)
svm_rbf = SVC(kernel='rbf', gamma=1).fit(X, y)
svm_poly = SVC(kernel='poly', degree=3).fit(X, y)

# Function to plot decision boundaries
def plot_svm(model, X, y, title):
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
    
    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()
    
    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100), np.linspace(ylim[0], ylim[1], 100))
    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    ax.contour(xx, yy, Z, levels=[0], linestyles=['solid'])
    plt.title(title)
    plt.show()

# Plot results
plot_svm(svm_linear, X, y, "SVM with Linear Kernel")
plot_svm(svm_rbf, X, y, "SVM with RBF Kernel")
plot_svm(svm_poly, X, y, "SVM with Polynomial Kernel")
Output Explanation
Linear Kernel: Tries to draw a straight line (fails for complex shapes).
Polynomial Kernel: Fits curves (good for polynomial relationships).
RBF Kernel: Best fit for highly non-linear data.


Q9.Compare Linear Kernel, Polynomial Kernel, and RBF Kernel:
Ans.Comparison of Linear Kernel, Polynomial Kernel, and RBF Kernel in SVM
Feature	Linear Kernel	Polynomial Kernel	RBF (Radial Basis Function) Kernel
Formula
𝐾
(
𝑥
,
𝑥
′
)
=
𝑥
⋅
𝑥
′
K(x,x
′
 )=x⋅x
′

𝐾
(
𝑥
,
𝑥
′
)
=
(
𝑥
⋅
𝑥
′
+
𝑐
)
𝑑
K(x,x
′
 )=(x⋅x
′
 +c)
d
 	( K(x, x') = \exp(-\gamma
Type of Mapping	No mapping (remains in original space)	Maps to a higher-degree polynomial space	Maps to an infinite-dimensional space
When to Use?	Data is linearly separable	Data has polynomial relationships	Data is highly non-linear
Hyperparameters	None	-
𝑐
c (coefficient)
-
𝑑
d (degree)	-
𝛾
γ (controls variance)
Computational Cost	Low (fastest)	Medium (depends on degree
𝑑
d)	High (most computationally expensive)
Flexibility	Rigid (only linear separation)	More flexible (can fit polynomial patterns)	Most flexible (captures complex patterns)
Overfitting Risk	Low	Moderate (depends on degree
𝑑
d)	High (for large
𝛾
γ)
Best For	Linearly separable data (e.g., text classification)	Moderately complex data (e.g., images, gene data)	Highly complex data (e.g., image recognition, speech classification)
1. Linear Kernel: Best for Linearly Separable Data
Formula:
𝐾
(
𝑥
,
𝑥
′
)
=
𝑥
⋅
𝑥
′
K(x,x
′
 )=x⋅x
′

Pros:
 Simple and fast.
 Works well for high-dimensional sparse data (e.g., text classification).
Cons:
 Cannot handle non-linear patterns.
Example Use Case: Spam detection, document classification.
Python Example:

In [None]:
from sklearn.svm import SVC
svm_linear = SVC(kernel='linear')


2. Polynomial Kernel: Best for Polynomial Relationships
Formula:
𝐾
(
𝑥
,
𝑥
′
)
=
(
𝑥
⋅
𝑥
′
+
𝑐
)
𝑑
K(x,x
′
 )=(x⋅x
′
 +c)
d

Pros:
 Captures polynomial relationships.
 More flexible than a linear kernel.
Cons:
 Can be computationally expensive for high-degree polynomials.
Risk of overfitting if
𝑑
d is too high.
Example Use Case: Image recognition with moderate complexity.
 Python Example:

In [None]:
svm_poly = SVC(kernel='poly', degree=3, coef0=1)


3. RBF Kernel: Best for Highly Non-Linear Data
Formula:
𝐾
(
𝑥
,
𝑥
′
)
=
exp
⁡
(
−
𝛾
∣
∣
𝑥
−
𝑥
′
∣
∣
2
)
K(x,x
′
 )=exp(−γ∣∣x−x
′
 ∣∣
2
 )
Pros:
 Extremely powerful for complex patterns.
Handles data that cannot be separated linearly or polynomially.
Cons:
 Computationally expensive.
 Can overfit if
𝛾
γ is too high.
Example Use Case: Image classification, speech recognition.
Python Example:

python

svm_rbf = SVC(kernel='rbf', gamma=0.5)
4. Visualization of Different Kernels
Let's compare the decision boundaries of Linear, Polynomial, and RBF kernels:

python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons

# Generate synthetic non-linearly separable data
X, y = make_moons(n_samples=100, noise=0.1, random_state=42)

# Train SVM models with different kernels
svm_linear = SVC(kernel='linear').fit(X, y)
svm_poly = SVC(kernel='poly', degree=3).fit(X, y)
svm_rbf = SVC(kernel='rbf', gamma=1).fit(X, y)

# Function to plot decision boundaries
def plot_svm(model, X, y, title):
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')

    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()

    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100), np.linspace(ylim[0], ylim[1], 100))
    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    ax.contour(xx, yy, Z, levels=[0], linestyles=['solid'])
    plt.title(title)
    plt.show()

# Plot results
plot_svm(svm_linear, X, y, "SVM with Linear Kernel")
plot_svm(svm_poly, X, y, "SVM with Polynomial Kernel")
plot_svm(svm_rbf, X, y, "SVM with RBF Kernel")

Q10.What is the effect of the C parameter in SVM
Ans.Effect of the C Parameter in SVM
The C parameter in Support Vector Machines (SVMs) controls the trade-off between achieving a perfect classification and maximizing the margin. It acts as a regularization parameter that determines how much misclassification is tolerated.

1. What is C in SVM?
C is a hyperparameter that influences the decision boundary.
A higher C means fewer misclassifications but a smaller margin.
A lower C allows a larger margin but permits some misclassification.
Mathematically, the objective function of SVM includes C as:

min
⁡
𝑤
,
𝑏
,
𝜉
1
2
∣
∣
𝑤
∣
∣
2
+
𝐶
∑
𝑖
=
1
𝑛
𝜉
𝑖
w,b,ξ
min
​
  
2
1
​
 ∣∣w∣∣
2
 +C
i=1
∑
n
​
 ξ
i
​

where:

∣
∣
𝑤
∣
∣
2
∣∣w∣∣
2
  ensures a maximum margin.
∑
𝜉
𝑖
∑ξ
i
​
  represents misclassified points (soft margin violations).
𝐶
C balances margin size vs. misclassification.
2. Effect of High vs. Low C
 High C (Strong Regularization, Less Margin)
SVM tries to classify all points correctly.
Small margin and complex decision boundary.
Risk of overfitting (good on training data, bad on new data).
 Example in Python:

python

from sklearn.svm import SVC
svm_high_c = SVC(kernel='linear', C=1000)  # High C
 Visualization:

Decision boundary is tight around data points.
Little tolerance for misclassified points.
May fail on new (test) data.
 Low C (Weak Regularization, More Margin)
Larger margin, but some misclassifications are allowed.
SVM prioritizes a simpler model (generalization over accuracy).
Risk of underfitting (not capturing patterns well).
 Example in Python:

python

svm_low_c = SVC(kernel='linear', C=0.01)  # Low C
 Visualization:

Decision boundary is smooth and wide.
More misclassification is accepted.
Better generalization to new data.
3. Visualization of C's Effect
Here’s a code snippet to visualize the effect of different C values on an SVM classifier:

python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_classification

# Generate synthetic classification data
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, random_state=42)

# Train SVMs with different C values
svm_low_c = SVC(kernel='linear', C=0.1).fit(X, y)
svm_high_c = SVC(kernel='linear', C=1000).fit(X, y)

# Function to plot decision boundaries
def plot_svm(model, X, y, title):
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')

    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()

    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100), np.linspace(ylim[0], ylim[1], 100))
    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    ax.contour(xx, yy, Z, levels=[0], linestyles=['solid'])
    plt.title(title)
    plt.show()

# Plot results
plot_svm(svm_low_c, X, y, "SVM with Low C (Large Margin, More Errors)")
plot_svm(svm_high_c, X, y, "SVM with High C (Small Margin, Fewer Errors)")


Q11.What is the role of the Gamma parameter in RBF Kernel SVM
Ans.Role of the Gamma Parameter in RBF Kernel SVM
The Gamma (𝛾) parameter in Radial Basis Function (RBF) Kernel SVM controls the influence of individual training points on the decision boundary. It determines how far the impact of a single training example reaches.

1. What is Gamma (𝛾) in SVM?
In the RBF kernel function:

𝐾
(
𝑥
,
𝑥
′
)
=
exp
⁡
(
−
𝛾
∣
∣
𝑥
−
𝑥
′
∣
∣
2
)
K(x,x
′
 )=exp(−γ∣∣x−x
′
 ∣∣
2
 )
where:

𝑥
,
𝑥
′
x,x
′
  are feature vectors.
∣
∣
𝑥
−
𝑥
′
∣
∣
2
∣∣x−x
′
 ∣∣
2
  is the squared Euclidean distance.
𝛾
γ determines how much influence a single training point has.
Effect of Gamma (𝛾)
High
𝛾
γ → Each point has a short-range influence (decision boundary becomes very tight around points).
Low
𝛾
γ → Each point has a long-range influence (decision boundary is smoother).
2. Effect of High vs. Low Gamma
 High Gamma (𝛾 → Large)
Each training point has a very local influence.
The decision boundary is highly flexible, capturing tiny variations.
Leads to overfitting (good on training data but bad on test data).
 Example in Python:

python

from sklearn.svm import SVC
svm_high_gamma = SVC(kernel='rbf', gamma=10)  # High Gamma
 Visualization:

The decision boundary wraps tightly around individual data points.
Captures too much noise, leading to poor generalization.
 Low Gamma (𝛾 → Small)
Each training point has a wide influence.
The decision boundary is smooth and less complex.
Leads to underfitting (does not capture complex patterns).
 Example in Python:

python

svm_low_gamma = SVC(kernel='rbf', gamma=0.01)  # Low Gamma
 Visualization:

The decision boundary is broad and smooth.
May miss important patterns in data.
3. Visualizing the Effect of Gamma
The following Python script demonstrates how different
𝛾
γ values impact decision boundaries:

python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons

# Generate synthetic non-linearly separable data
X, y = make_moons(n_samples=100, noise=0.1, random_state=42)

# Train SVMs with different gamma values
svm_low_gamma = SVC(kernel='rbf', gamma=0.1).fit(X, y)
svm_high_gamma = SVC(kernel='rbf', gamma=10).fit(X, y)

# Function to plot decision boundaries
def plot_svm(model, X, y, title):
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')

    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()

    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100), np.linspace(ylim[0], ylim[1], 100))
    Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    ax.contour(xx, yy, Z, levels=[0], linestyles=['solid'])
    plt.title(title)
    plt.show()

# Plot results
plot_svm(svm_low_gamma, X, y, "SVM with Low Gamma (Smooth Decision Boundary)")
plot_svm(svm_high_gamma, X, y, "SVM with High Gamma (Tight Decision Boundary)")
4. Summary of Gamma’s Effect
Gamma Value	Effect on Decision Boundary	Risk
Low
𝛾
γ (e.g., 0.01)	Smooth and generalized	Underfitting
High
𝛾
γ (e.g., 10)	Complex, wraps tightly around data	Overfitting
5. Choosing the Right Gamma
 Low Gamma → If you want a smooth, general boundary (good for noisy data).
 High Gamma → If you want a tight fit around training data (good for clean data).

Grid Search for Optimal Gamma
To automatically find the best Gamma, use GridSearchCV:

python

from sklearn.model_selection import GridSearchCV

param_grid = {'gamma': [0.01, 0.1, 1, 10, 100]}
svm = SVC(kernel='rbf')
grid_search = GridSearchCV(svm, param_grid, cv=5)
grid_search.fit(X, y)

print("Best Gamma:", grid_search.best_params_['gamma'])

Q12.What is the Naïve Bayes classifier, and why is it called "Naïve"
Ans.Naïve Bayes Classifier: Explanation & Why It's "Naïve"
1. What is the Naïve Bayes Classifier?
Naïve Bayes is a probabilistic machine learning algorithm based on Bayes' Theorem. It is mainly used for classification tasks such as spam filtering, sentiment analysis, and document classification.

Bayes' Theorem:
𝑃
(
𝑌
∣
𝑋
)
=
𝑃
(
𝑋
∣
𝑌
)
𝑃
(
𝑌
)
𝑃
(
𝑋
)
P(Y∣X)=
P(X)
P(X∣Y)P(Y)
​

where:

𝑃
(
𝑌
∣
𝑋
)
P(Y∣X) → Posterior Probability (probability of class Y given features X).
𝑃
(
𝑋
∣
𝑌
)
P(X∣Y) → Likelihood (probability of features X given class Y).
𝑃
(
𝑌
)
P(Y) → Prior Probability (probability of class Y occurring).
𝑃
(
𝑋
)
P(X) → Evidence (overall probability of features X).
2. Why is it Called "Naïve"?
The "Naïve" part comes from the assumption that all features (X) are independent of each other given the class
𝑌
Y.

 Example:
In a spam classifier, words like "free", "win", and "money" might appear in spam emails. Naïve Bayes assumes:

𝑃
(
"free"
,
"win"
,
"money"
∣
Spam
)
=
𝑃
(
"free"
∣
Spam
)
×
𝑃
(
"win"
∣
Spam
)
×
𝑃
(
"money"
∣
Spam
)
P("free","win","money"∣Spam)=P("free"∣Spam)×P("win"∣Spam)×P("money"∣Spam)
 Real-world fact: Words in a sentence are not truly independent, but Naïve Bayes assumes they are to simplify computation.

3. Types of Naïve Bayes Classifiers
 Gaussian Naïve Bayes (GNB) → Assumes features are normally distributed (good for continuous data).
 Multinomial Naïve Bayes (MNB) → Used for text classification (counts word frequencies).
 Bernoulli Naïve Bayes (BNB) → Works with binary features (e.g., word presence in spam detection).

4. Python Example: Naïve Bayes for Spam Classification
python

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Example data: spam vs. non-spam messages
messages = ["Win a free lottery now", "Call me later", "Congratulations, you won!", "Let's meet for lunch"]
labels = [1, 0, 1, 0]  # 1 = Spam, 0 = Not Spam

# Convert text to numerical feature vectors
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(messages)

# Train Naïve Bayes model
nb = MultinomialNB()
nb.fit(X, labels)

# Predict a new message
new_message = ["Free money, claim now"]
X_new = vectorizer.transform(new_message)
prediction = nb.predict(X_new)

print("Prediction:", "Spam" if prediction[0] == 1 else "Not Spam")
Output Example: "Spam"

5. Advantages & Disadvantages of Naïve Bayes
Feature	Pros 	Cons
Speed	Very fast to train and predict	Assumes feature independence (not always true)
Works Well on Small Data	Performs well with limited training data	Struggles with correlated features
Handles High-Dimensional Data	Used in text classification (spam filtering, sentiment analysis)	Requires proper feature engineering
Performs Well on Probabilistic Data	Great for medical diagnosis, spam filtering, NLP	Can overestimate probabilities when assumptions don’t hold


Q13.What is Bayes’ Theorem
AnsBayes’ Theorem: Explanation & Formula
Bayes’ Theorem is a fundamental rule in probability that describes how to update our beliefs based on new evidence. It is widely used in machine learning, statistics, and decision-making.

1. Bayes’ Theorem Formula
The theorem states:

𝑃
(
𝐴
∣
𝐵
)
=
𝑃
(
𝐵
∣
𝐴
)
⋅
𝑃
(
𝐴
)
𝑃
(
𝐵
)
P(A∣B)=
P(B)
P(B∣A)⋅P(A)
​

where:

𝑃
(
𝐴
∣
𝐵
)
P(A∣B) → Posterior Probability (probability of event A happening given B has occurred).
𝑃
(
𝐵
∣
𝐴
)
P(B∣A) → Likelihood (probability of event B occurring given A is true).
𝑃
(
𝐴
)
P(A) → Prior Probability (initial belief about event A before evidence).
𝑃
(
𝐵
)
P(B) → Evidence (total probability of event B across all possible scenarios).
2. Intuition Behind Bayes’ Theorem
 It helps update our beliefs based on new information.
 Used in classification problems (e.g., Spam Detection, Medical Diagnosis).

3. Example: Medical Diagnosis (Cancer Test)
Suppose:

1% of people have cancer →
𝑃
(
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
)
=
0.01
P(Cancer)=0.01.
If a person has cancer, the test is 90% accurate →
𝑃
(
𝑃
𝑜
𝑠
𝑖
𝑡
𝑖
𝑣
𝑒
∣
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
)
=
0.9
P(Positive∣Cancer)=0.9.
If a person doesn’t have cancer, the test falsely gives 5% false positives →
𝑃
(
𝑃
𝑜
𝑠
𝑖
𝑡
𝑖
𝑣
𝑒
∣
𝑁
𝑜
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
)
=
0.05
P(Positive∣NoCancer)=0.05.
What is the probability that a person actually has cancer if they test positive?
Using Bayes’ Theorem:

𝑃
(
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
∣
𝑃
𝑜
𝑠
𝑖
𝑡
𝑖
𝑣
𝑒
)
=
𝑃
(
𝑃
𝑜
𝑠
𝑖
𝑡
𝑖
𝑣
𝑒
∣
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
)
⋅
𝑃
(
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
)
𝑃
(
𝑃
𝑜
𝑠
𝑖
𝑡
𝑖
𝑣
𝑒
)
P(Cancer∣Positive)=
P(Positive)
P(Positive∣Cancer)⋅P(Cancer)
​

where:

𝑃
(
𝑃
𝑜
𝑠
𝑖
𝑡
𝑖
𝑣
𝑒
)
=
𝑃
(
𝑃
𝑜
𝑠
𝑖
𝑡
𝑖
𝑣
𝑒
∣
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
)
⋅
𝑃
(
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
)
+
𝑃
(
𝑃
𝑜
𝑠
𝑖
𝑡
𝑖
𝑣
𝑒
∣
𝑁
𝑜
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
)
⋅
𝑃
(
𝑁
𝑜
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
)
P(Positive)=P(Positive∣Cancer)⋅P(Cancer)+P(Positive∣NoCancer)⋅P(NoCancer)
Substituting values:

𝑃
(
𝑃
𝑜
𝑠
𝑖
𝑡
𝑖
𝑣
𝑒
)
=
(
0.9
×
0.01
)
+
(
0.05
×
0.99
)
=
0.0594
P(Positive)=(0.9×0.01)+(0.05×0.99)=0.0594
Now:

𝑃
(
𝐶
𝑎
𝑛
𝑐
𝑒
𝑟
∣
𝑃
𝑜
𝑠
𝑖
𝑡
𝑖
𝑣
𝑒
)
=
0.9
×
0.01
0.0594
=
0.009
0.0594
≈
0.1515
P(Cancer∣Positive)=
0.0594
0.9×0.01
​
 =
0.0594
0.009
​
 ≈0.1515
So, even if the test is positive, the probability of actually having cancer is only ~15.15%!

4. Python Implementation of Bayes' Theorem
python

def bayes_theorem(prior_A, likelihood_B_given_A, prior_not_A, likelihood_B_given_not_A):
    P_B = (likelihood_B_given_A * prior_A) + (likelihood_B_given_not_A * prior_not_A)
    posterior_A_given_B = (likelihood_B_given_A * prior_A) / P_B
    return posterior_A_given_B

# Given probabilities
prior_cancer = 0.01
likelihood_positive_given_cancer = 0.9
prior_no_cancer = 1 - prior_cancer
likelihood_positive_given_no_cancer = 0.05

# Compute probability of having cancer given a positive test result
posterior_cancer_given_positive = bayes_theorem(prior_cancer, likelihood_positive_given_cancer,
                                                 prior_no_cancer, likelihood_positive_given_no_cancer)
print(f"Probability of having cancer given a positive test: {posterior_cancer_given_positive:.4f}")
 Output: 0.1515 (~15.15%)

5. Applications of Bayes' Theorem
 Naïve Bayes Classifier (Spam filtering, Sentiment Analysis).
 Medical Diagnosis (Predicting diseases based on test results).
Fraud Detection (Finding probability of fraud given transaction data).
 Speech & Image Recognition (Updating probabilities based on features).



Q14.Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes:
Ans.Differences Between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes
Naïve Bayes classifiers are probabilistic models based on Bayes’ Theorem with an assumption that features are conditionally independent given the class. Different variations exist depending on the type of data.

Type	Data Type	Assumption	Best Use Cases
Gaussian Naïve Bayes (GNB)	Continuous (real-valued)	Assumes features follow a normal (Gaussian) distribution	Used for numerical features like age, height, salary, temperature
Multinomial Naïve Bayes (MNB)	Discrete (counts)	Features represent word counts or frequency	Used in text classification (spam detection, NLP, sentiment analysis)
Bernoulli Naïve Bayes (BNB)	Binary (0/1)	Features represent presence/absence of words	Used for binary features like "word exists or not" in spam filtering
 Gaussian Naïve Bayes (GNB)
 Used for continuous numerical features.
 Assumes features follow a Gaussian (Normal) distribution.
 Example: Height, Weight, Age, Temperature, Income, Exam Scores

 Probability Formula (Normal Distribution Assumption):
For each feature
𝑥
𝑖
x
i
​
 , the probability is computed as:

𝑃
(
𝑥
𝑖
∣
𝑌
)
=
1
2
𝜋
𝜎
2
𝑒
−
(
𝑥
𝑖
−
𝜇
)
2
2
𝜎
2
P(x
i
​
 ∣Y)=
2πσ
2

​

1
​
 e
−
2σ
2

(x
i
​
 −μ)
2

​


where:

𝜇
μ = Mean of feature
𝑥
𝑖
x
i
​
  in class
𝑌
Y.
𝜎
2
σ
2
  = Variance of feature
𝑥
𝑖
x
i
​
  in class
𝑌
Y.
 Python Example:

python


from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Train Gaussian Naïve Bayes
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predict
y_pred = gnb.predict(X_test)

print("GaussianNB Accuracy:", gnb.score(X_test, y_test))
 Best for:

Continuous numerical data
Weather prediction, medical diagnosis (e.g., blood pressure, cholesterol levels)
 Multinomial Naïve Bayes (MNB)
 Used for discrete features (word counts, frequencies).
 Common in Natural Language Processing (NLP).
 Example: Spam filtering, sentiment analysis, text classification.

 Probability Formula (Word Counts Assumption):

𝑃
(
𝑥
𝑖
∣
𝑌
)
=
𝑐
𝑜
𝑢
𝑛
𝑡
(
𝑥
𝑖
 in class
𝑌
)
+
𝛼
∑
count(all words in class
𝑌
)
+
𝛼
P(x
i
​
 ∣Y)=
∑count(all words in class Y)+α
count(x
i
​
  in class Y)+α
​

where:

𝛼
α = Smoothing parameter (Laplace Smoothing, avoids zero probabilities).
 Python Example:

python

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Sample text data
documents = ["Free money now", "Call me later", "Win a lottery now", "Meet me for dinner"]
labels = [1, 0, 1, 0]  # 1 = Spam, 0 = Not Spam

# Convert text to word count vectors
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(documents)

# Train Multinomial Naïve Bayes
mnb = MultinomialNB()
mnb.fit(X, labels)

# Predict new text
new_doc = ["Win a free lottery"]
X_new = vectorizer.transform(new_doc)
prediction = mnb.predict(X_new)

print("Prediction:", "Spam" if prediction[0] == 1 else "Not Spam")
 Best for:

Text classification, NLP (spam detection, news categorization)
Document classification, word frequency-based tasks
 Bernoulli Naïve Bayes (BNB)
 Used for binary (0/1) features.
 Each feature is treated as a "Yes/No" (Present/Absent).
 Example: Spam filtering (word present or not), sentiment analysis.

 Probability Formula (Binary Feature Assumption):

𝑃
(
𝑥
𝑖
∣
𝑌
)
=
𝑝
𝑖
𝑥
𝑖
(
1
−
𝑝
𝑖
)
(
1
−
𝑥
𝑖
)
P(x
i
​
 ∣Y)=p
i
x
i
​

​
 (1−p
i
​
 )
(1−x
i
​
 )

where
𝑥
𝑖
x
i
​
  is 1 if the feature is present, else 0.

 Python Example:

python

from sklearn.naive_bayes import BernoulliNB

# Binary word presence dataset
X = [[1, 0, 1], [0, 1, 0], [1, 1, 0], [0, 0, 1]]
y = [1, 0, 1, 0]  # 1 = Spam, 0 = Not Spam

# Train Bernoulli Naïve Bayes
bnb = BernoulliNB()
bnb.fit(X, y)

# Predict
new_sample = [[1, 0, 0]]  # Example with "first word present, others absent"
prediction = bnb.predict(new_sample)

print("Prediction:", "Spam" if prediction[0] == 1 else "Not Spam")

Q15.When should you use Gaussian Naïve Bayes over other variants
Ans.When to Use Gaussian Naïve Bayes (GNB) Over Other Variants?
Gaussian Naïve Bayes (GNB) is best suited for continuous numerical data that follows a normal (Gaussian) distribution. It differs from MultinomialNB and BernoulliNB, which are designed for categorical and text-based data.

 Use GaussianNB When:
1 Features Are Continuous (Real Numbers)
 If your dataset contains numerical variables like height, weight, age, income, temperature, blood pressure, etc., then GaussianNB is the best choice.
 It models each feature using a normal distribution, which works well for real-valued inputs.

 Example: Medical Diagnosis (Diabetes Prediction)

Features: Blood pressure, glucose levels, BMI.
Target: Diabetic (Yes/No).
Why GNB? These values are continuous and assumed to be normally distributed.
2 Features Are Normally Distributed (or Close to It)
 If your feature values follow a bell-shaped curve (Gaussian distribution), GNB will perform well.
 Even if the distribution isn’t perfectly normal, GNB is still effective in many cases.

 Example: Iris Flower Classification

Features: Sepal length, petal width (real-valued).
Why GNB? These features are continuous and approximately Gaussian.
python


from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load Iris dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Train Gaussian Naïve Bayes model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predict & Evaluate
accuracy = gnb.score(X_test, y_test)
print("GaussianNB Accuracy:", accuracy)
 When You Need a Fast & Simple Classifier
 GNB is extremely fast to train and predict because it computes probabilities analytically without iterative optimization.
 Works well for small to medium-sized datasets.

 Example: Real-time Predictions

Use Case: Fraud detection in financial transactions.
Why GNB? It provides quick predictions on continuous features like transaction amount, time, and frequency.
 Avoid GaussianNB When:
 Data Is Categorical or Discrete (Use MultinomialNB)
 If your features represent word counts, frequency, or categorical values, GaussianNB is not suitable.
 Instead, use MultinomialNB for text classification tasks.

 Example: Spam Detection

Wrong choice: GaussianNB (doesn’t handle word counts well).
Right choice: MultinomialNB (counts word occurrences).
 Data Has Binary Features (Use BernoulliNB)
 If features are binary (0 or 1), like "word present or not", use BernoulliNB instead.
 GNB assumes continuous inputs, so binary data won't work well.

 Example: Spam Filtering Based on Presence of Words

Wrong choice: GaussianNB (assumes numerical values).
Right choice: BernoulliNB (better for binary presence/absence).
 Features Are Highly Correlated
 Naïve Bayes assumes feature independence. If features are strongly correlated, the assumption breaks down.
 Solution: Consider Logistic Regression or Decision Trees instead.

 Example: Predicting House Prices

Features: Square footage, number of rooms, lot size.
These features are strongly correlated, making GNB less effective.
Alternative: Use Linear Regression or Decision Trees.


Q16.What are the key assumptions made by Naïve Bayes
Ans.Key Assumptions of Naïve Bayes
Naïve Bayes is a probabilistic classifier based on Bayes' Theorem. It is called "naïve" because it makes strong independence assumptions about the features. Here are the key assumptions:

1 Conditional Independence Assumption
 Assumption: Features are conditionally independent given the class label.

 Mathematically, for a given class
𝑌
Y, the probability of input features
𝑋
=
(
𝑥
1
,
𝑥
2
,
.
.
.
,
𝑥
𝑛
)
X=(x
1
​
 ,x
2
​
 ,...,x
n
​
 ) is:

𝑃
(
𝑋
∣
𝑌
)
=
𝑃
(
𝑥
1
∣
𝑌
)
×
𝑃
(
𝑥
2
∣
𝑌
)
×
.
.
.
×
𝑃
(
𝑥
𝑛
∣
𝑌
)
P(X∣Y)=P(x
1
​
 ∣Y)×P(x
2
​
 ∣Y)×...×P(x
n
​
 ∣Y)
 Why it's useful?

Simplifies calculations, making Naïve Bayes fast and efficient.
Works well even if the independence assumption is not strictly true.
 When this assumption fails?

If features are highly correlated, Naïve Bayes performs poorly.
Example: Predicting house prices using lot size, number of rooms, and total square footage (which are correlated).
Alternative: Use Logistic Regression, Decision Trees, or Random Forest instead.
2 Feature Probability Distribution Assumption
 Assumption: Each feature follows a specific probability distribution depending on the variant of Naïve Bayes used.

Variants of Naïve Bayes and Their Assumptions:
Naïve Bayes Type	Assumed Feature Distribution	Suitable for
GaussianNB	Features follow a normal (Gaussian) distribution	Continuous data (e.g., age, height, blood pressure)
MultinomialNB	Features represent word counts or frequencies	Text classification (spam filtering, sentiment analysis)
BernoulliNB	Features are binary (0/1)	Binary classification (word presence/absence in spam detection)
 What happens if the assumption is violated?

If data is not normally distributed, GaussianNB may not perform well.
If features are not word counts, MultinomialNB will give incorrect probabilities.
Solution: Use a different Naïve Bayes variant or try other models (e.g., SVM, Decision Trees).
3 Class Conditional Independence
 Assumption: Each feature contributes independently to the class label.
 Mathematically,

𝑃
(
𝑌
∣
𝑋
)
∝
𝑃
(
𝑌
)
×
𝑃
(
𝑥
1
∣
𝑌
)
×
𝑃
(
𝑥
2
∣
𝑌
)
×
.
.
.
×
𝑃
(
𝑥
𝑛
∣
𝑌
)
P(Y∣X)∝P(Y)×P(x
1
​
 ∣Y)×P(x
2
​
 ∣Y)×...×P(x
n
​
 ∣Y)
This means:

The effect of one feature does not depend on another.
Example: In email classification, the presence of "free" and "lottery" are treated as independent, even though they often appear together in spam emails.
 Why it's useful?

Makes Naïve Bayes computationally efficient.
Works well in text classification, where words often appear independently in documents.
 When does it fail?

If features have strong dependencies (e.g., "free" and "lottery" always appear together).
Solution:
Use feature selection techniques (e.g., remove highly correlated words).
Use models that handle dependencies (e.g., Random Forest, Neural Networks).
4 All Features Are Equally Important
 Assumption: All features contribute equally to the final classification.

 Works well when:

Features are balanced and have similar levels of importance.
 Fails when:

Some features are more influential than others.
Example: In fraud detection, transaction amount may be more important than time of day.
 Solution:

Feature engineering: Scale important features properly.
Use different models: Logistic Regression, SVM, or Decision Trees can assign different feature weights.


Q17. What are the advantages and disadvantages of Naïve Bayes
Ans.Advantages and  Disadvantages of Naïve Bayes
Naïve Bayes is a powerful, simple, and efficient algorithm, but it has some limitations. Let’s break it down.

 Advantages of Naïve Bayes
1 Fast and Efficient
Training and prediction are very fast, even on large datasets.
Computationally lightweight, making it ideal for real-time applications.
Example: Spam detection in emails—can classify thousands of emails in seconds.

2 Works Well with Small Datasets
Unlike deep learning or complex models, Naïve Bayes works well with limited data.
Can generalize well even with a few training examples.
Example: Classifying medical conditions based on a small set of patient records.

3 Effective for High-Dimensional Data
Performs well with high-dimensional datasets (lots of features).
Commonly used in text classification (e.g., sentiment analysis, spam detection).
 Example: Classifying news articles based on thousands of words in the text.

4 Handles Missing Data Well
Since it computes probabilities independently for each feature, missing values don’t affect it much.
 Example: In a dataset where some features (e.g., blood pressure) are missing, it can still predict disease probability.

5 Works Well for Categorical and Text Data
Multinomial and Bernoulli Naïve Bayes work exceptionally well for text classification and categorical data.
 Example:

Spam detection (word frequency-based).
Sentiment analysis (positive/negative review classification).
6 Handles Class Imbalance Well
Works well even when one class is much more frequent than others.
Assigns probabilities based on prior occurrences.
 Example: Fraud detection, where fraud cases are rare but still need accurate classification.

 Disadvantages of Naïve Bayes
1 Assumes Feature Independence
The biggest drawback: It assumes all features are independent (which is rarely true).
If features are correlated, Naïve Bayes performs poorly.
 Example:

Predicting house prices using square footage and number of rooms (which are correlated).
Solution: Use models like Logistic Regression, SVM, or Random Forest instead.
2 Struggles with Continuous Data
Gaussian Naïve Bayes assumes continuous data follows a normal distribution, which may not always be true.
If the real data distribution is skewed or multimodal, performance drops.
 Example:

Predicting customer age range for an online service. If the distribution is not Gaussian, predictions may be off.
Solution: Try Kernel Density Estimation (KDE) or other classifiers like Decision Trees.
3 Zero Probability Problem (Smoothing Required)
If a word or feature never appeared in training data, it gets a probability of 0, which can cause issues.
Solution: Laplace Smoothing (adds a small value to avoid zero probabilities).
 Example:

If a new word appears in an email, Naïve Bayes might classify it incorrectly as non-spam.
Using Laplace Smoothing: Adjust probabilities to avoid hard zeros.
python
Copy
Edit
from sklearn.naive_bayes import MultinomialNB

# Use Laplace smoothing (alpha > 0 to avoid zero probabilities)
nb = MultinomialNB(alpha=1.0)
4 Not Good for Complex Decision Boundaries
Works best for simple decision boundaries but fails for complex, non-linear relationships.
Solution: Use SVM, Neural Networks, or Decision Trees.
 Example:

Naïve Bayes struggles in image classification where pixels interact in complex ways.
Instead, CNNs (Convolutional Neural Networks) are a better choice.
5 Sensitive to Noisy or Redundant Features
If there are too many irrelevant or correlated features, performance drops.
Solution: Perform feature selection before applying Naïve Bayes.
 Example:

If spam detection includes unrelated features like email font size or background color, Naïve Bayes may get confused.

Q18.Why is Naïve Bayes a good choice for text classification
Ans.Naïve Bayes is one of the best algorithms for text classification, including spam detection, sentiment analysis, topic categorization, and email filtering. Here’s why:

 1 Works Well with High-Dimensional Data (Many Features)
 In text classification, every unique word is a feature. Documents can have thousands of words (features), leading to high-dimensional data.
 Naïve Bayes handles high-dimensional data efficiently because it treats each word independently and only computes probabilities.

 Example:

A dataset with 10,000 unique words (features) and 1 million emails (examples).
Naïve Bayes still trains and predicts efficiently because it only needs word frequency counts.
Other models like SVM or deep learning struggle with high-dimensional text without feature reduction.

 2 Very Fast to Train and Predict
 Naïve Bayes uses simple probability calculations, making it one of the fastest classifiers.
 Training takes seconds or minutes, even for millions of documents.

 Example:

Spam filtering: Gmail processes billions of emails daily.
Naïve Bayes can classify emails in milliseconds based on word probabilities.
 Alternative models (like SVM or deep learning) take longer to train and predict.

 3 Handles Missing Words Well (Sparse Data)
 In text classification, not every word appears in every document.
 Naïve Bayes is robust to missing features (unseen words) because it calculates probabilities independently for each word.

 Example:

If a word like "crypto" appears only in some emails, it doesn't break the model because Naïve Bayes still works with other available words.
 Other models like Decision Trees or SVM may struggle with missing words.

 4 Works Well for Imbalanced Datasets (Spam vs. Non-Spam)
 Many real-world datasets are imbalanced (e.g., spam vs. non-spam emails, fake vs. real reviews).
 Naïve Bayes still performs well because it assigns probabilities based on prior frequencies.

 Example:

If 95% of emails are non-spam and 5% are spam, Naïve Bayes automatically adjusts for imbalance using prior probabilities:
𝑃
(
𝑆
𝑝
𝑎
𝑚
)
=
0.05
,
𝑃
(
𝑁
𝑜
𝑡

𝑆
𝑝
𝑎
𝑚
)
=
0.95
P(Spam)=0.05,P(Not Spam)=0.95
 Other models (like Logistic Regression) require explicit rebalancing techniques.

5 Simple and Interpretable
 Naïve Bayes provides clear probability scores, making it easy to understand.
 You can see which words contribute most to a classification.

 Example:

If "lottery" and "win" appear frequently in spam emails, Naïve Bayes assigns high probability to spam.
You can interpret why an email is classified as spam.
 Deep learning models are complex and hard to interpret.

 6 Different Variants for Different Text Data
Naïve Bayes has different versions that handle different types of text data:

Variant	Assumed Feature Type	Best Use Case
MultinomialNB	Word counts or frequencies	Spam filtering, topic classification
BernoulliNB	Word presence (0/1)	Short text, binary features (e.g., "word exists or not")
GaussianNB	Continuous values	Uncommon for text, more for numerical features
 Example:

MultinomialNB works best for email classification (spam filtering).
BernoulliNB is useful when only word presence matters (e.g., document categorization).
 Other models don’t have specific versions optimized for different text types.
 Example: Using Naïve Bayes for Spam Classification
 Step 1: Import and Load Data
python

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split

# Sample dataset (spam vs. non-spam messages)
emails = [
    ("Win a free iPhone now!", "spam"),
    ("Meeting at 10 AM tomorrow", "ham"),
    ("Congratulations! You won $1000", "spam"),
    ("Reminder: Doctor appointment at 4 PM", "ham"),
]

# Split data into texts and labels
X_texts, y_labels = zip(*emails)

# Convert text into word frequency features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(X_texts)

# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y_labels, test_size=0.2, random_state=42)

# Train Naïve Bayes Model
nb = MultinomialNB()
nb.fit(X_train, y_train)

# Predict on new messages
new_email = ["Win a free vacation trip now!"]
X_new = vectorizer.transform(new_email)
prediction = nb.predict(X_new)

print("Prediction:", prediction[0])  # Expected output: spam

Q19.Compare SVM and Naïve Bayes for classification tasks
Ans.SVM (Support Vector Machine) and Naïve Bayes (NB) are both popular classifiers, but they work differently and are suited for different types of problems. Let's compare them in detail.

 Key Differences Between SVM and Naïve Bayes
Feature	SVM (Support Vector Machine)	Naïve Bayes (NB)
Approach	Discriminative Model (Finds a decision boundary)	Generative Model (Estimates probabilities)
Algorithm Type	Maximizes margin between classes (Optimization-based)	Uses Bayes’ Theorem for probability calculation
Feature Independence Assumption	No assumption (Considers feature relationships)	Assumes all features are independent
Performance on High-Dimensional Data	Works well, but requires kernel tricks for non-linear data	Works exceptionally well, especially for text classification
Computational Complexity	Slower, especially on large datasets	Extremely fast and scalable
Robustness to Outliers	More sensitive to outliers	Less sensitive to outliers
Interpretability	Harder to interpret (complex decision boundary)	Simple and interpretable (probabilities assigned)
Best for	Complex classification problems, image recognition, non-linear datasets	Text classification, spam filtering, sentiment analysis
 When to Use Naïve Bayes?
 Best suited for:

Text classification tasks (spam filtering, sentiment analysis, topic categorization).
Real-time applications (fast training and prediction).
High-dimensional data (thousands of features, like words in text).
Small datasets (performs well even with limited data).
 Not ideal when:

Features are correlated (NB assumes independence, which may not be true).
You need a complex decision boundary (e.g., image classification).
 Example:
Naïve Bayes is the best choice for email spam detection because words in emails are mostly independent, and the model only needs word frequency counts.

 When to Use SVM?
 Best suited for:

Binary classification tasks (e.g., fraud detection, medical diagnosis).
Complex decision boundaries (SVM can learn non-linear relationships using kernels).
Small to medium datasets (handles non-linearly separable data well).
Structured data with feature relationships (since it doesn’t assume independence).
 Not ideal when:

The dataset is very large (SVM is computationally expensive).
You need probabilistic outputs (SVM doesn’t provide direct probabilities like NB).
 Example:
SVM is a better choice for image classification (e.g., handwritten digit recognition) because the relationships between pixels are complex and not independent.

 Example: Comparing SVM and Naïve Bayes for Text Classification
python
Copy
Edit
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sample dataset (Spam vs. Ham)
emails = [
    ("Win a free iPhone now!", "spam"),
    ("Meeting at 10 AM tomorrow", "ham"),
    ("Congratulations! You won $1000", "spam"),
    ("Reminder: Doctor appointment at 4 PM", "ham"),
]

# Prepare data
X_texts, y_labels = zip(*emails)
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(X_texts)
X_train, X_test, y_train, y_test = train_test_split(X, y_labels, test_size=0.2, random_state=42)

# Train Naïve Bayes Model
nb_model = MultinomialNB()
nb_model.fit(X_train, y_train)
nb_preds = nb_model.predict(X_test)

# Train SVM Model
svm_model = SVC(kernel="linear")  # Using linear kernel for text data
svm_model.fit(X_train, y_train)
svm_preds = svm_model.predict(X_test)

# Compare Accuracy
print("Naïve Bayes Accuracy:", accuracy_score(y_test, nb_preds))
print("SVM Accuracy:", accuracy_score(y_test, svm_preds))

Q20.How does Laplace Smoothing help in Naïve Bayes?
Ans.The Problem: Zero Probability Issue
Naïve Bayes calculates the probability of a class given certain features using Bayes' Theorem:

𝑃
(
𝐶
𝑙
𝑎
𝑠
𝑠
∣
𝐹
𝑒
𝑎
𝑡
𝑢
𝑟
𝑒
𝑠
)
∝
𝑃
(
𝐹
𝑒
𝑎
𝑡
𝑢
𝑟
𝑒
𝑠
∣
𝐶
𝑙
𝑎
𝑠
𝑠
)
⋅
𝑃
(
𝐶
𝑙
𝑎
𝑠
𝑠
)
P(Class∣Features)∝P(Features∣Class)⋅P(Class)
When calculating
𝑃
(
𝐹
𝑒
𝑎
𝑡
𝑢
𝑟
𝑒
𝑠
∣
𝐶
𝑙
𝑎
𝑠
𝑠
)
P(Features∣Class), we multiply the probabilities of individual features (e.g., words in text classification):

𝑃
(
𝑤
1
,
𝑤
2
,
𝑤
3
∣
𝐶
𝑙
𝑎
𝑠
𝑠
)
=
𝑃
(
𝑤
1
∣
𝐶
𝑙
𝑎
𝑠
𝑠
)
×
𝑃
(
𝑤
2
∣
𝐶
𝑙
𝑎
𝑠
𝑠
)
×
𝑃
(
𝑤
3
∣
𝐶
𝑙
𝑎
𝑠
𝑠
)
×
…
P(w
1
​
 ,w
2
​
 ,w
3
​
 ∣Class)=P(w
1
​
 ∣Class)×P(w
2
​
 ∣Class)×P(w
3
​
 ∣Class)×…
However, if a word never appears in the training data for a certain class, its probability becomes zero:

𝑃
(
𝑤
𝑖
∣
𝐶
𝑙
𝑎
𝑠
𝑠
)
=
0
P(w
i
​
 ∣Class)=0
Since probabilities are multiplied, the entire prediction collapses to zero, making classification impossible.

 The Solution: Laplace Smoothing (Additive Smoothing)
To fix the zero probability issue, we apply Laplace Smoothing, which adds a small constant
𝑘
k to all word counts before calculating probabilities.

The smoothed probability formula is:

𝑃
(
𝑤
𝑖
∣
𝐶
𝑙
𝑎
𝑠
𝑠
)
=
Count
(
𝑤
𝑖
,
𝐶
𝑙
𝑎
𝑠
𝑠
)
+
𝑘
Total Words in Class
+
𝑘
×
Vocabulary Size
P(w
i
​
 ∣Class)=
Total Words in Class+k×Vocabulary Size
Count(w
i
​
 ,Class)+k
​

Where:

𝑘
k (Laplace constant) is typically 1.
Count(
𝑤
𝑖
,
𝐶
𝑙
𝑎
𝑠
𝑠
w
i
​
 ,Class) = Number of times word
𝑤
𝑖
w
i
​
  appears in documents of a given class.
Total Words in Class = Sum of all word counts in documents of that class.
Vocabulary Size = Number of unique words in the dataset.
 Example: Without vs. With Laplace Smoothing
 Without Laplace Smoothing:
Consider spam detection with two classes: Spam and Not Spam.
Suppose we have this training data:

Email	Content	Class
1	"Win a free iPhone now"	Spam
2	"Meeting at 10 AM"	Not Spam
3	"Congratulations! You won"	Spam
Now, let's predict whether "You won a free trip" is spam.

Word counts in spam emails:
"win" = 1, "free" = 1, "iPhone" = 1, "congratulations" = 1, "won" = 1
"trip" never appeared in training data.
Probability of "trip" in Spam:

𝑃
(
"
𝑡
𝑟
𝑖
𝑝
"
∣
𝑆
𝑝
𝑎
𝑚
)
=
0
Total Words in Spam
P("trip"∣Spam)=
Total Words in Spam
0
​

Since P("trip" | Spam) = 0, the entire probability of the email being spam collapses to zero.

 With Laplace Smoothing (k=1):
Using Laplace Smoothing:

𝑃
(
"
𝑡
𝑟
𝑖
𝑝
"
∣
𝑆
𝑝
𝑎
𝑚
)
=
0
+
1
Total Words in Spam
+
Vocabulary Size
P("trip"∣Spam)=
Total Words in Spam+Vocabulary Size
0+1
​

Even though "trip" never appeared, it now gets a nonzero probability, allowing the classifier to still make a meaningful prediction.

 Key Benefits of Laplace Smoothing
 Prevents Zero Probabilities – Ensures unseen words don’t make the entire classification fail.
 Improves Generalization – Helps the model handle new words in test data.
 Works Well for Text Classification – Commonly used in Naïve Bayes for spam filtering, sentiment analysis, and document categorization.

 Python Example: Naïve Bayes with Laplace Smoothing
By default, Scikit-learn’s Naïve Bayes (MultinomialNB) uses Laplace Smoothing with
𝛼
=
1
α=1 (equivalent to
𝑘
k).

python

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Sample training dataset
emails = ["Win a free iPhone", "Meeting at 10 AM", "Congratulations! You won"]
labels = ["spam", "ham", "spam"]

# Convert text to word count vectors
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)

# Train Naïve Bayes with Laplace Smoothing (alpha=1)
nb = MultinomialNB(alpha=1)
nb.fit(X, labels)

# Predict new email
new_email = ["You won a free trip"]
X_new = vectorizer.transform(new_email)
prediction = nb.predict(X_new)

print("Prediction:", prediction[0])  # Expected output: spam

Q21.Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy:
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM classifier with an RBF kernel
svm_model = SVC(kernel='rbf', C=1.0, gamma='scale')
svm_model.fit(X_train, y_train)

# Predict on test data
y_pred = svm_model.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")


Q22.Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then
compare their accuracies
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X, y = wine.data, wine.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear', C=1.0)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Compare Accuracies
print(f"Linear Kernel Accuracy: {accuracy_linear:.2f}")
print(f"RBF Kernel Accuracy: {accuracy_rbf:.2f}")


Q23.Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean
Squared Error (MSE)
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load the Boston Housing dataset
boston = datasets.load_diabetes()  # Using Diabetes dataset since Boston Housing is deprecated
X, y = boston.data, boston.target  # Features and target

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVR model with an RBF kernel
svr_model = SVR(kernel='rbf', C=100, gamma='scale', epsilon=0.1)
svr_model.fit(X_train, y_train)

# Predict on test data
y_pred = svr_model.predict(X_test)

# Evaluate using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.2f}")


Q24. Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision
boundary
Ans.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler

# Generate a synthetic dataset
X, y = make_moons(n_samples=200, noise=0.2, random_state=42)

# Standardize features for better SVM performance
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Train SVM with a Polynomial Kernel
svm_poly = SVC(kernel='poly', degree=3, C=1.0)
svm_poly.fit(X, y)

# Function to plot decision boundary
def plot_decision_boundary(model, X, y):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                         np.linspace(y_min, y_max, 100))

    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.contourf(xx, yy, Z, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors="k", cmap=plt.cm.coolwarm)
    plt.title("SVM with Polynomial Kernel (degree=3)")
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.show()

# Visualize decision boundary
plot_decision_boundary(svm_poly, X, y)


Q25.Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and
evaluate accuracy
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
breast_cancer = datasets.load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Gaussian Naïve Bayes Classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predict on test data
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")


Q26.Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20
Newsgroups dataset.
Ans.

In [None]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the 20 Newsgroups dataset (only some categories for speed)
categories = ['sci.space', 'rec.sport.baseball', 'comp.graphics', 'talk.politics.mideast']
newsgroups = fetch_20newsgroups(subset='all', categories=categories, remove=('headers', 'footers', 'quotes'))

X, y = newsgroups.data, newsgroups.target  # Text data and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a pipeline for text preprocessing and classification
text_clf = Pipeline([
    ('vect', CountVectorizer()),          # Convert text to token counts
    ('tfidf', TfidfTransformer()),        # Apply TF-IDF transformation
    ('clf', MultinomialNB())              # Tra


Q27.Write a Python program to train an SVM Classifier with different C values and compare the decision
boundaries visually
Ans.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler

# Generate a synthetic dataset (non-linearly separable)
X, y = make_moons(n_samples=200, noise=0.2, random_state=42)

# Standardize the dataset for better SVM performance
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Different C values to compare
C_values = [0.01, 1, 100]

# Function to plot decision boundary
def plot_decision_boundary(model, X, y, title):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                         np.linspace(y_min, y_max, 100))

    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.contourf(xx, yy, Z, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors="k", cmap=plt.cm.coolwarm)
    plt.title(title)
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")

# Train SVM classifiers with different C values and plot their decision boundaries
plt.figure(figsize=(12, 4))

for i, C in enumerate(C_values):
    svm_model = SVC(kernel='rbf', C=C, gamma='scale')
    svm_model.fit(X, y)

    plt.subplot(1, len(C_values), i + 1)
    plot_decision_boundary(svm_model, X, y, f"SVM with C={C}")

plt.tight_layout()
plt.show()


Q28.Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with
binary features
Ans.

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import accuracy_score

# Generate a synthetic dataset with binary features
X, y = make_classification(n_samples=500, n_features=10, n_informative=5,
                           n_redundant=0, n_classes=2, random_state=42)

# Convert features into binary (0 or 1) using a threshold
X = (X > 0).astype(int)

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Bernoulli Naïve Bayes Classifier
bnb = BernoulliNB()
bnb.fit(X_train, y_train)

# Predict on test data
y_pred = bnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")


Q29.Write a Python program to apply feature scaling before training an SVM model and compare results with
unscaled data
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
breast_cancer = datasets.load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM model WITHOUT feature scaling
svm_unscaled = SVC(kernel='rbf', C=1.0, gamma='scale')
svm_unscaled.fit(X_train, y_train)
y_pred_unscaled = svm_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)

# Apply feat


Q30.Write a Python program to train a Gaussian Naïve Bayes model and compare the predictions before and
after Laplace Smoothing
ANS.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Gaussian Naïve Bayes WITHOUT Laplace Smoothing (default)
gnb_no_smoothing = GaussianNB(var_smoothing=0)  # No Laplace Smoothing
gnb_no_smoothing.fit(X_train, y_train)
y_pred_no_smoothing = gnb_no_smoothing.predict(X_test)
accuracy_no_smoothing = accuracy_score(y_test, y_pred_no_smoothing)

# Train Gaussian Naïve Bayes WITH Laplace Smoothing (default smoothing)
gnb_smoothing = GaussianNB(var_smoothing=1e-9)  # Default small smoothing
gnb_smoothing.fit(X_train, y_train)
y_pred_smoothing = gnb_smoothing.predict(X_test)
accuracy_smoothing = accuracy_score(y_test, y_pred_smoothing)

# Compare results
print(f"Accuracy WITHOUT Laplace Smoothing: {accuracy_no_smoothing:.4f}")
print(f"Accuracy WITH Laplace Smoothing: {accuracy_smoothing:.4f}")

# Compare predictions
print("\nPredictions WITHOUT Smoothing:", y_pred_no_smoothing[:10])
print("Predictions WITH Smoothing:", y_pred_smoothing[:10])


Q31.Write a Python program to train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C,
gamma, kernel)
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the parameter grid for GridSearchCV
param_grid = {
    'C': [0.1, 1, 10, 100],             # Regularization parameter
    'gamma': ['scale', 'auto', 0.01, 0.1, 1],  # Kernel coefficient (for RBF, poly, sigmoid)
    'kernel': ['linear', 'rbf', 'poly']  # Types of kernel
}

# Initialize SVM model
svm_model = SVC()

# Perform Grid Search with cross-validation
grid_search = GridSearchCV(svm_model, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

# Get the best parameters and best model
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

# Make predictions using the best model
y_pred = best_model.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print("Best Hyperparameters:", best_params)
print(f"Best Model Accuracy on Test Data: {accuracy:.4f}")


Q32.Write a Python program to train an SVM Classifier on an imbalanced dataset and apply class weighting and
check it improve accuracy
Ans.

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Generate an imbalanced dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2,
                           weights=[0.9, 0.1], flip_y=0, random_state=42)

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Train an SVM model WITHOUT class weighting
svm_no_weight = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_no_weight.fit(X_train, y_train)
y_pred_no_weight = svm_no_weight.predict(X_test)

# Train an SVM model WITH class weighting
svm_with_weight = SVC(kernel='rbf', C=1.0, gamma='scale', class_weight='balanced', random_state=42)
svm_with_weight.fit(X_train, y_train)
y_pred_with_weight = svm_with_weight.predict(X_test)

# Evaluate both models
accuracy_no_weight = accuracy_score(y_test, y_pred_no_weight)
accuracy_with_weight = accuracy_score(y_test, y_pred_with_weight)

print(f"Accuracy WITHOUT class weighting: {accuracy_no_weight:.4f}")
print(f"Accuracy WITH class weighting: {accuracy_with_weight:.4f}")

# Print classification reports for better insight
print("\nClassification Report WITHOUT class weighting:")
print(classification_report(y_test, y_pred_no_weight))

print("\nClassification Report WITH class weighting:")
print(classification_report(y_test, y_pred_with_weight))


Q33.Write a Python program to implement a Naïve Bayes classifier for spam detection using email data
Ans.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load the SMS Spam Collection dataset (download if needed)
url = "https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms-spam-collection.csv"
df = pd.read_csv(url, encoding='latin-1')

# Rename columns for clarity
df.columns = ["label", "message"]

# Convert labels to binary values (ham = 0, spam = 1)
df["label"] = df["label"].map({"ham": 0, "spam": 1})

# Split dataset into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(df["message"], df["label"], test_size=0.2, random_state=42)

# Convert text messages into numerical feature vectors using TF-IDF
vectorizer = TfidfVectorizer(stop_words="english", max_features=5000)  # Limit features for efficiency
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Train a Multinomial Naïve Bayes classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train_tfidf, y_train)

# Predict on the test set
y_pred = nb_classifier.predict(X_test_tfidf)

# Evaluate accuracy and classification report
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred))


Q34.Write a Python program to train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and
compare their accuracy
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM Classifier
svm_model = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_model.fit(X_train, y_train)
y_pred_svm = svm_model.predict(X_test)

# Train a Naïve Bayes Classifier (GaussianNB)
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
y_pred_nb = nb_model.predict(X_test)

# Evaluate accuracy of both models
accuracy_svm = accuracy_score(y_test, y_pred_svm)
accuracy_nb = accuracy_score(y_test, y_pred_nb)

# Print results
print(f"Accuracy of SVM Classifier: {accuracy_svm:.4f}")
print(f"Accuracy of Naïve Bayes Classifier: {accuracy_nb:.4f}")

# Compare results
if accuracy_svm > accuracy_nb:
    print("SVM performs better than Naïve Bayes on this dataset.")
elif accuracy_nb > accuracy_svm:
    print("Naïve Bayes performs better than SVM on this dataset.")
else:
    print("Both classifiers have similar accuracy.")


Q35.Write a Python program to perform feature selection before training a Naïve Bayes classifier and compare
results
Ans.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Naïve Bayes classifier WITHOUT feature selection
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
y_pred_no_fs = nb_model.predict(X_test)
accuracy_no_fs = accuracy_score(y_test, y_pred_no_fs)

# Perform feature selection (SelectKBest with Chi-Square Test)
k = 10  # Select top 10 features
selector = SelectKBest(chi2, k=k)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)

# Train a Naïve Bayes classifier WITH feature selection
nb_model_fs = GaussianNB()
nb_model_fs.fit(X_train_selected, y_train)
y_pred_fs = nb_model_fs.predict(X_test_selected)
accuracy_fs = accuracy_score(y_test, y_pred_fs)

# Print results
print(f"Accuracy WITHOUT Feature Selection: {accuracy_no_fs:.4f}")
print(f"Accuracy WITH Feature Selection (Top {k} Features): {accuracy_fs:.4f}")

# Compare results
if accuracy_fs > accuracy_no_fs:
    print(f"Feature selection improved accuracy by {accuracy_fs - accuracy_no_fs:.4f}")
else:
    print(f"Feature selection did not improve accuracy. Consider tuning the number of features.")


Q36.Write a Python program to train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO)
strategies on the Wine dataset and compare their accuracy
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X, y = wine.data, wine.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM using One-vs-Rest (OvR)
svm_ovr = OneVsRestClassifier(SVC(kernel='linear', C=1.0, random_state=42))
svm_ovr.fit(X_train, y_train)
y_pred_ovr = svm_ovr.predict(X_test)
accuracy_ovr = accuracy_score(y_test, y_pred_ovr)

# Train SVM using One-vs-One (OvO)
svm_ovo = OneVsOneClassifier(SVC(kernel='linear', C=1.0, random_state=42))
svm_ovo.fit(X_train, y_train)
y_pred_ovo = svm_ovo.predict(X_test)
accuracy_ovo = accuracy_score(y_test,


Q37.Write a Python program to train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast
Cancer dataset and compare their accuracy
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear', C=1.0, random_state=42)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM with Polynomial Kernel
svm_poly = SVC(kernel='poly', degree=3, C=1.0, gamma='scale', random_state=42)
svm_poly.fit(X_train, y_train)
y_pred_poly = svm_poly.predict(X_test)
accuracy_poly = accuracy_score(y_test, y_pred_poly)

# Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Print results
print(f"Accuracy using Linear Kernel: {accuracy_linear:.4f}")
print(f"Accuracy using Polynomial Kernel: {accuracy_poly:.4f}")
print(f"Accuracy using RBF Kernel: {accuracy_rbf:.4f}")

# Compare results
best_kernel = max(zip(["Linear", "Polynomial", "RBF"], [accuracy_linear, accuracy_poly, accuracy_rbf]), key=lambda x: x[1])
print(f"\nBest performing kernel: {best_kernel[0]} with accuracy {best_kernel[1]:.4f}")


Q38.Write a Python program to train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the
average accuracy
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.svm import SVC
import numpy as np

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target  # Features and labels

# Define Stratified K-Fold Cross-Validation
k = 5  # Number of folds
skf = StratifiedKFold(n_splits=k, shuffle=True, random_state=42)

# Train SVM Classifier using Stratified K-Fold Cross-Validation
svm_model = SVC(kernel='linear', C=1.0, random_state=42)

# Perform cross-validation and compute accuracy for each fold
accuracies = cross_val_score(svm_model, X, y, cv=skf, scoring='accuracy')

# Print results
print(f"Accuracies for each fold: {accuracies}")
print(f"Average Accurac


Q39.Write a Python program to train a Naïve Bayes classifier using different prior probabilities and compare
performance
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Naïve Bayes with default prior probabilities
nb_default = GaussianNB()
nb_default.fit(X_train, y_train)
y_pred_default = nb_default.predict(X_test)
accuracy_default = accuracy_score(y_test, y_pred_default)

# Define custom prior probabilities (e.g., assuming class imbalance)
custom_priors = [0.3, 0.7]  # Adjust based on domain knowledge

# Train Naïve Bayes with custom priors
nb_custom = GaussianNB(priors=custom_priors)
nb_custom.fit(X_train, y_train)
y_pred_custom = nb_custom.predict(X_test)
accuracy_custom = accuracy_score(y_test, y_pred_custom)

# Print results
print(f"Accuracy with Default Priors: {accuracy_default:.4f}")
print(f"Accuracy with Custom Priors {custom_priors}: {accuracy_custom:.4f}")

# Compare results
if accuracy_custom > accuracy_default:
    print("Custom prior probabilities improved accuracy.")
elif accuracy_custom < accuracy_default:
    print("Default prior probabilities performed better.")
else:
    print("Both models have the same accuracy.")


Q40.Write a Python program to perform Recursive Feature Elimination (RFE) before training an SVM Classifier and
compare accuracy
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.feature_selection import RFE
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM without feature selection
svm_full = SVC(kernel='linear', C=1.0, random_state=42)
svm_full.fit(X_train, y_train)
y_pred_full = svm_full.predict(X_test)
accuracy_full = accuracy_score(y_test, y_pred_full)

# Perform Recursive Feature Elimination (RFE) with SVM
n_features_to_select = 10  # Choose number of features to keep
rfe = RFE(estimator=SVC(kernel='linear', C=1.0), n_features_to_select=n_features_to_select)
rfe.fit(X_train, y_train)

# Train SVM with selected features
X_train_rfe = rfe.transform(X_train)
X_test_rfe = rfe.transform(X_test)
svm_rfe = SVC(kernel='linear', C=1.0, random_state=42)
svm_rfe.fit(X_train_rfe, y_train)
y_pred_rfe = svm_rfe.predict(X_test_rfe)
accuracy_rfe = accuracy_score(y_test, y_pred_rfe)

# Print results
print(f"Accuracy with all features: {accuracy_full:.4f}")
print(f"Accuracy after RFE (Top {


Q41.Write a Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and
F1-Score instead of accuracy
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM Classifier
svm_model = SVC(kernel='linear', C=1.0, random_state=42)
svm_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = svm_model.predict(X_test)

# Evaluate performance using Precision, Recall, and F1-Score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print results
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")

# Print full classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Q42.Write a Python program to train a Naïve Bayes Classifier and evaluate its performance using Log Loss
(Cross-Entropy Loss)
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import log_loss

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Naïve Bayes Classifier
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Predict probabilities for log loss calculation
y_prob = nb_model.predict_proba(X_test)  # Get probability estimates for both classes

# Compute Log Loss (Cross-Entropy Loss)
logloss = log_loss(y_test, y_prob)

# Print results
print(f"Log Loss (Cross-Entropy Loss): {logloss:.4f}")


Q43.Write a Python program to train an SVM Classifier and visualize the Confusion Matrix using seaborn
Ans.

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM Classifier
svm_model = SVC(kernel='linear', C=1.0, random_state=42)
svm_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = svm_model.predict(X_test)

# Compute the Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

# Plot the Confusion Matrix using Seaborn
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Malignant", "Benign"], yticklabels=["Malignant", "Benign"])
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix - SVM Classifier")
plt.show()


Q44.Write a Python program to train an SVM Regressor (SVR) and evaluate its performance using Mean Absolute
Error (MAE) instead of MSE
Ans.

In [None]:
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error

# Load the California Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target  # Features and target variable

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply Feature Scaling (important for SVR)
scaler_X = StandardScaler()
scaler_y = StandardScaler()

X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)

# Reshape y for scaling (SVR requires y as a 2D array)
y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).ravel()
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1)).ravel()

# Train an SVM Regressor (SVR)
svr_model = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
svr_model.fit(X_train_scaled, y_train_scaled)

# Make predictions on the test set
y_pred_scaled = svr_model.predict(X_test_scaled)

# Inverse transform predictions to original scale
y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1)).ravel()

# Compute Mean Absolute Error (MAE)
mae = mean_absolute_error(y_test, y_pred)

# Print results
print(f"Mean Absolute Error (MAE): {mae:.4f}")


Q45.Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC
score
Ans.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import roc_auc_score

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target  # Features and labels

# Split into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Naïve Bayes Classifier
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Predict probabilities for ROC-AUC computation
y_prob = nb_model.predict_proba(X_test)[:, 1]  # Probability of class 1 (Benign)

# Compute the ROC-AUC Score
roc_auc = roc_auc_score(y_test, y_prob)

# Print results
print(f"ROC-AUC Score: {roc_auc:.4f}")


Q46.Write a Python program to train an SVM Classifier and visualize the Precision-Recall Curve.
Ans.