In [None]:
#SVM & Naive Bayes                                 Theoretical

#1. What is a Support Vector Machine (SVM)?

Answer: Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks.
 It finds an optimal hyperplane that best separates different classes in the dataset. The hyperplane is chosen in such a way that it maximizes the margin between different classes, making SVM effective in high-dimensional spaces.

#2. What is the difference between Hard Margin and Soft Margin SVM?

Answer:Hard Margin SVM: It strictly separates classes without allowing any misclassification. It works only when data is linearly separable.

Soft Margin SVM: It allows some misclassification by introducing a penalty term (C parameter) to handle overlapping classes. This makes it more robust for real-world noisy datasets.

#3. What is the mathematical intuition behind SVM?
Answer:SVM aims to find a hyperplane defined by the equation:

where  is the weight vector and  is the bias. The goal is to maximize the margin between support vectors (points closest to the hyperplane). This is formulated as an optimization problem:

subject to: for all training samples .

#4. What is the role of Lagrange Multipliers in SVM?

Answer: Lagrange multipliers are used to convert the constrained optimization problem into an unconstrained one using the Lagrangian function. They help in solving the dual form of SVM, allowing the use of kernel functions to handle non-linearly separable data.

#5. What are Support Vectors in SVM?

Answer: Support vectors are the data points that lie closest to the decision boundary. They determine the optimal position of the hyperplane and have a direct impact on model performance.

#6. What is a Support Vector Classifier (SVC)?

Answer: Support Vector Classifier (SVC) is the classification version of SVM, which assigns labels to data points by finding the best possible decision boundary.

#7. What is a Support Vector Regressor (SVR)?

Answer: Support Vector Regressor (SVR) is the regression variant of SVM. It attempts to fit the data within a margin of error, known as the -tube, instead of predicting exact values.

#8. What is the Kernel Trick in SVM?

Answer: The Kernel Trick allows SVM to map data to a higher-dimensional space using kernel functions, enabling it to solve problems where data is not linearly separable.

#9. Compare Linear Kernel, Polynomial Kernel, and RBF Kernel.
Answer:Linear Kernel: Used when data is linearly separable.
Polynomial Kernel: Maps data to a higher-dimensional space using polynomial transformation.
RBF Kernel: Uses Gaussian transformation to model complex relationships.
#10. What is the effect of the C parameter in SVM?

Answer: The C parameter controls the trade-off between maximizing margin and minimizing classification errors. A higher C results in a stricter boundary with fewer misclassifications, while a lower C allows for a wider margin but may misclassify more points.

#11. What is the role of the Gamma parameter in RBF Kernel SVM?

Answer:Gamma determines the influence of a single training example. A high gamma leads to a more complex model with tighter decision boundaries, while a low gamma results in a smoother, more generalized model.

#12. What is the Naïve Bayes classifier, and why is it called "Naïve"?

Answer: Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem, assuming that features are independent. It is called "naïve" because it assumes independence among predictors, which is rarely true in real-world data.

#13. What is Bayes’ Theorem?

Answer: Bayes’ Theorem states:

It calculates the probability of event A occurring given that event B has occurred.

#14. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.
Answer:
Gaussian Naïve Bayes: Assumes continuous data follows a normal distribution.

Multinomial Naïve Bayes: Used for text classification with word frequency counts.

Bernoulli Naïve Bayes: Used for binary features, like presence/absence of a word.

#15. When should you use Gaussian Naïve Bayes over other variants?

Answer: Gaussian Naïve Bayes is best when features are continuous and follow a normal distribution.

#16. What are the key assumptions made by Naïve Bayes?
Answer: Features are independent.

Every feature contributes equally to the outcome.

#17. What are the advantages and disadvantages of Naïve Bayes?

Answer:
Advantages:

Works well with small datasets.

Fast and efficient.

Handles missing data well.

Disadvantages:

Assumption of independence is often unrealistic.

Not ideal for complex feature relationships.

#18. Why is Naïve Bayes a good choice for text classification?

Answer:
It performs well on high-dimensional sparse data, such as text documents, and is computationally efficient.

#19. Compare SVM and Naïve Bayes for classification tasks.

Answer:

SVM: Works well on complex decision boundaries but is computationally expensive.

Naïve Bayes: Faster, better for text classification but relies on independence assumption.

#20. How does Laplace Smoothing help in Naïve Bayes?

Answer:
Laplace Smoothing prevents zero probability issues by adding a small value to all probability estimates, ensuring unseen events do not get a probability of zero.

In [None]:
                                                             PRACTICAL

In [None]:
# qno 21 : Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM classifier
svm_model = SVC(kernel='linear')
svm_model.fit(X_train, y_train)

# Make predictions
y_pred = svm_model.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")


In [None]:
#Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.

from sklearn.datasets import load_wine
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)

# Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)

# Compare accuracies
accuracy_linear = accuracy_score(y_test, y_pred_linear)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

print(f"Linear Kernel Accuracy: {accuracy_linear:.2f}")
print(f"RBF Kernel Accuracy: {accuracy_rbf:.2f}")

In [None]:
#23. Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean Squared Error (MSE).

from sklearn.datasets import fetch_california_housing
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVR model
svr_model = SVR(kernel='rbf')
svr_model.fit(X_train, y_train)

# Make predictions
y_pred = svr_model.predict(X_test)

# Evaluate using Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

In [None]:
#24. Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_classification

# Generate synthetic dataset
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_clusters_per_class=1, random_state=42)

# Train SVM with polynomial kernel
svm_poly = SVC(kernel='poly', degree=3)
svm_poly.fit(X, y)

# Plot decision boundary
def plot_decision_boundary(model, X, y):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.contourf(xx, yy, Z, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
    plt.title("SVM with Polynomial Kernel")
    plt.show()

plot_decision_boundary(svm_poly, X, y)

In [None]:
#25. Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.

from sklearn.datasets import load_breast_cancer
from sklearn.naive_bayes import GaussianNB

# Load dataset
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Gaussian Naïve Bayes model
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Predict and evaluate accuracy
accuracy = nb_model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2f}")

In [None]:
#26. Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset.

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score

# Load dataset
categories = ['rec.sport.baseball', 'rec.sport.hockey', 'sci.space', 'comp.graphics']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories, remove=('headers', 'footers', 'quotes'))

# Create a text classification pipeline
model = make_pipeline(CountVectorizer(), TfidfTransformer(), MultinomialNB())

# Train the model
model.fit(newsgroups_train.data, newsgroups_train.target)

# Predict on test data
y_pred = model.predict(newsgroups_test.data)

# Evaluate accuracy
accuracy = accuracy_score(newsgroups_test.target, y_pred)
print(f"Accuracy: {accuracy:.2f}")

In [None]:
#27. Write a Python program to train an SVM Classifier with different C values and compare the decision boundaries visually.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_classification

# Generate dataset
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_clusters_per_class=1, random_state=42)

# Different C values
C_values = [0.01, 1, 100]
models = [SVC(kernel='linear', C=C_val).fit(X, y) for C_val in C_values]

# Plot decision boundaries
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for i, (model, C_val) in enumerate(zip(models, C_values)):
    xx, yy = np.meshgrid(np.linspace(X[:, 0].min() - 1, X[:, 0].max() + 1, 100),
                         np.linspace(X[:, 1].min() - 1, X[:, 1].max() + 1, 100))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    axes[i].contourf(xx, yy, Z, alpha=0.3)
    axes[i].scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
    axes[i].set_title(f"SVM with C = {C_val}")

plt.show()

In [None]:
#28. Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with binary features.
from sklearn.naive_bayes import BernoulliNB
from sklearn.preprocessing import Binarizer

# Sample binary dataset
X = np.random.randint(0, 2, size=(100, 5))
y = np.random.randint(0, 2, size=100)

# Train Bernoulli Naïve Bayes model
bnb = BernoulliNB()
bnb.fit(X, y)

# Predict and evaluate
accuracy = bnb.score(X, y)
print(f"Accuracy: {accuracy:.2f}")

In [None]:
#29. Write a Python program to apply feature scaling before training an SVM model and compare results with unscaled data.

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# Train SVM without feature scaling
svm_no_scaling = SVC(kernel='rbf')
svm_no_scaling.fit(X_train, y_train)
accuracy_no_scaling = svm_no_scaling.score(X_test, y_test)

# Train SVM with feature scaling
svm_with_scaling = make_pipeline(StandardScaler(), SVC(kernel='rbf'))
svm_with_scaling.fit(X_train, y_train)
accuracy_with_scaling = svm_with_scaling.score(X_test, y_test)

print(f"Accuracy without Scaling: {accuracy_no_scaling:.2f}")
print(f"Accuracy with Scaling: {accuracy_with_scaling:.2f}")

In [None]:
#30. Write a Python program to train a Gaussian Naïve Bayes model and compare the predictions before and after Laplace Smoothing.

# Train Gaussian Naïve Bayes without Laplace Smoothing
nb_no_smoothing = GaussianNB(var_smoothing=1e-9)
nb_no_smoothing.fit(X_train, y_train)
accuracy_no_smoothing = nb_no_smoothing.score(X_test, y_test)

# Train Gaussian Naïve Bayes with Laplace Smoothing
nb_with_smoothing = GaussianNB(var_smoothing=1e-2)
nb_with_smoothing.fit(X_train, y_train)
accuracy_with_smoothing = nb_with_smoothing.score(X_test, y_test)

print(f"Accuracy without Laplace Smoothing: {accuracy_no_smoothing:.2f}")
print(f"Accuracy with Laplace Smoothing: {accuracy_with_smoothing:.2f}")

In [None]:
#31. Write a Python program to train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C, gamma, kernel).

from sklearn.model_selection import GridSearchCV

# Define hyperparameters grid
param_grid = {
    'C': [0.1, 1, 10],
    'gamma': ['scale', 'auto'],
    'kernel': ['linear', 'rbf']
}

# Perform GridSearch
grid_search = GridSearchCV(SVC(), param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Best parameters
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Accuracy: {grid_search.best_score_:.2f}")

In [None]:
#32. Write a Python program to train an SVM Classifier on an imbalanced dataset and apply class weighting to improve accuracy.

from sklearn.utils.class_weight import compute_class_weight

# Create imbalanced dataset
X_imbalanced, y_imbalanced = make_classification(n_samples=1000, weights=[0.9, 0.1], random_state=42)

# Train SVM without class weighting
svm_no_weighting = SVC(kernel='linear')
svm_no_weighting.fit(X_imbalanced, y_imbalanced)
accuracy_no_weighting = svm_no_weighting.score(X_imbalanced, y_imbalanced)

# Train SVM with class weighting
svm_with_weighting = SVC(kernel='linear', class_weight='balanced')
svm_with_weighting.fit(X_imbalanced, y_imbalanced)
accuracy_with_weighting = svm_with_weighting.score(X_imbalanced, y_imbalanced)

print(f"Accuracy without Class Weighting: {accuracy_no_weighting:.2f}")
print(f"Accuracy with Class Weighting: {accuracy_with_weighting:.2f}")

In [None]:
#33. Write a Python program to implement a Naïve Bayes classifier for spam detection using email data.

from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

# Sample dataset (email texts and labels)
emails = ["Win a free iPhone now!", "Limited time offer, claim your prize", "Meeting at 10 AM", "Your invoice is attached"]
labels = [1, 1, 0, 0]  # 1 = spam, 0 = not spam

# Train Multinomial Naïve Bayes classifier
spam_model = make_pipeline(CountVectorizer(), TfidfTransformer(), MultinomialNB())
spam_model.fit(emails, labels)

# Predict on new emails
new_emails = ["Exclusive deal, win big!", "Let's meet tomorrow"]
predictions = spam_model.predict(new_emails)

print("Predictions:", predictions)

In [None]:
#34. Write a Python program to train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and compare their accuracy.

from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC

# Train SVM Classifier
svm_model = SVC(kernel='linear')
svm_model.fit(X_train, y_train)
svm_accuracy = svm_model.score(X_test, y_test)

# Train Naïve Bayes Classifier
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
nb_accuracy = nb_model.score(X_test, y_test)

print(f"SVM Accuracy: {svm_accuracy:.2f}")
print(f"Naïve Bayes Accuracy: {nb_accuracy:.2f}")

In [None]:
#35. Write a Python program to perform feature selection before training a Naïve Bayes classifier and compare results.

from sklearn.feature_selection import SelectKBest, chi2

# Select top 10 best features
X_new = SelectKBest(chi2, k=10).fit_transform(X, y)

# Train Naïve Bayes Classifier
nb_model = GaussianNB()
nb_model.fit(X_new, y_train)
accuracy = nb_model.score(X_test[:, :10], y_test)

print(f"Accuracy after feature selection: {accuracy:.2f}")

In [None]:
#36. Write a Python program to train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO) strategies on the Wine dataset and compare their accuracy.

from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier

# Train One-vs-Rest SVM
ovr_model = OneVsRestClassifier(SVC(kernel='linear'))
ovr_model.fit(X_train, y_train)
ovr_accuracy = ovr_model.score(X_test, y_test)

# Train One-vs-One SVM
ovo_model = OneVsOneClassifier(SVC(kernel='linear'))
ovo_model.fit(X_train, y_train)
ovo_accuracy = ovo_model.score(X_test, y_test)

print(f"OvR Accuracy: {ovr_accuracy:.2f}")
print(f"OvO Accuracy: {ovo_accuracy:.2f}")

In [None]:
#37. Write a Python program to train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast Cancer dataset and compare their accuracy.

kernels = ['linear', 'poly', 'rbf']

for kernel in kernels:
    model = SVC(kernel=kernel)
    model.fit(X_train, y_train)
    accuracy = model.score(X_test, y_test)
    print(f"Kernel: {kernel}, Accuracy: {accuracy:.2f}")

In [None]:
#38. Write a Python program to train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the average accuracy.

from sklearn.model_selection import StratifiedKFold, cross_val_score

# Perform Stratified K-Fold Cross-Validation
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
svm_model = SVC(kernel='linear')
scores = cross_val_score(svm_model, X, y, cv=cv)

print(f"Average Accuracy: {scores.mean():.2f}")

In [None]:
#39. Write a Python program to train a Naïve Bayes classifier using different prior probabilities and compare performance.

# Train Naïve Bayes with custom priors
nb_prior = GaussianNB(priors=[0.3, 0.7])
nb_prior.fit(X_train, y_train)
accuracy_prior = nb_prior.score(X_test, y_test)

print(f"Accuracy with Custom Priors: {accuracy_prior:.2f}")

In [None]:
#40. Write a Python program to perform Recursive Feature Elimination (RFE) before training an SVM Classifier and compare accuracy.

from sklearn.feature_selection import RFE

# Perform Recursive Feature Elimination
rfe = RFE(SVC(kernel='linear'), n_features_to_select=5)
X_new = rfe.fit_transform(X, y)

# Train SVM Classifier
svm_model.fit(X_new, y)
print("Accuracy after feature selection:", svm_model.score(X_new, y))

In [None]:
#41. Write a Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and F1-Score instead of accuracy.

from sklearn.metrics import precision_score, recall_score, f1_score

# Make predictions
y_pred = svm_model.predict(X_test)

# Compute precision, recall, and F1-score
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1:.2f}")

In [None]:
#42. Write a Python program to train a Naïve Bayes Classifier and evaluate its performance using Log Loss (Cross-Entropy Loss).

from sklearn.metrics import log_loss

# Predict probabilities
y_prob = nb_model.predict_proba(X_test)

# Compute Log Loss
log_loss_value = log_loss(y_test, y_prob)
print(f"Log Loss: {log_loss_value:.2f}")

In [None]:
#43. Write a Python program to train an SVM Classifier and visualize the Confusion Matrix using seaborn.

import seaborn as sns
from sklearn.metrics import confusion_matrix

# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot confusion matrix
sns.heatmap(cm, annot=True, cmap='Blues', fmt='d')
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

In [None]:
#44. Write a Python program to train an SVM Regressor (SVR) and evaluate its performance using Mean Absolute Error (MAE) instead of MSE.

from sklearn.metrics import mean_absolute_error

# Compute MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error: {mae:.2f}")

In [None]:
#45. Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC score.

from sklearn.metrics import roc_auc_score

# Compute ROC-AUC score
roc_auc = roc_auc_score(y_test, y_prob[:, 1])
print(f"ROC-AUC Score: {roc_auc:.2f}")

In [None]:
#46. Write a Python program to train an SVM Classifier and visualize the Precision-Recall Curve.

from sklearn.metrics import precision_recall_curve

precision, recall, _ = precision_recall_curve(y_test, y_prob[:, 1])

plt.plot(recall, precision, marker='.')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.show()