# SVM & Naïve Bayes — Question-by-Question (Detailed Answers + Code)

## Part A — Theory (Question-by-question answers)


### Q1. What is a Support Vector Machine (SVM)?

**Answer:**

A Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression. For classification, SVM finds the hyperplane (in input feature space) that maximizes the margin between classes — the margin being the distance between the hyperplane and the nearest training points from each class. SVM is effective in high-dimensional spaces and can use kernel functions to handle non-linearly separable data.


### Q2. What is the difference between Hard Margin and Soft Margin SVM?

**Answer:**

**Hard Margin SVM** requires the training data to be linearly separable and finds a hyperplane that makes zero classification errors. It maximizes the margin subject to all points being correctly classified. This is sensitive to noise/outliers.

**Soft Margin SVM** introduces slack variables (ξ_i) allowing some misclassifications. It trades off margin width and misclassification using a penalty parameter C. Soft margin is robust to non-separable data and noise.


### Q3. What is the mathematical intuition behind SVM?

**Answer (mathematical intuition):**

SVM solves the optimization problem:

Minimize (1/2) ||w||^2 subject to y_i (w·x_i + b) ≥ 1 for all i (separable case).

Maximizing the margin is equivalent to minimizing the norm of the weight vector w while enforcing correct classification constraints. For non-separable data, slack variables ξ_i are added: y_i (w·x_i + b) ≥ 1 − ξ_i, with ξ_i ≥ 0, and the objective becomes

Minimize (1/2)||w||^2 + C Σ ξ_i

where C controls the penalty for misclassification.


### Q4. What is the role of Lagrange Multipliers in SVM?

**Answer:**

Lagrange multipliers convert the constrained primal optimization problem into an unconstrained dual problem. The dual formulation depends only on dot products between training vectors (x_i·x_j). Lagrange multipliers α_i are non-zero only for support vectors; solving the dual is computationally advantageous and enables kernelization (replacing dot products with kernels).


### Q5. What are Support Vectors in SVM?

**Answer:**

Support vectors are the training samples that lie closest to the decision boundary (i.e., on the margin or inside it). In the dual problem they correspond to non-zero Lagrange multipliers α_i. They uniquely define the separating hyperplane — removing non-support vectors does not change the solution.


### Q6. What is a Support Vector Classifier (SVC)?

**Answer:**

A Support Vector Classifier (SVC) typically refers to the SVM algorithm used for classification tasks. It separates data into discrete categories by finding the optimal hyperplane with maximum margin. In scikit-learn, `sklearn.svm.SVC` implements the classifier with kernels.


### Q7. What is a Support Vector Regressor (SVR)?

**Answer:**

Support Vector Regressor (SVR) applies the SVM idea to regression. SVR tries to fit a function that deviates from target values by at most ε for all training data while being as flat as possible. It uses an ε-insensitive loss and can also be kernelized. In scikit-learn, it's `sklearn.svm.SVR`.


### Q8. What is the Kernel Trick in SVM?

**Answer (Kernel Trick):**

The kernel trick replaces the dot product x·z in the SVM dual formulation with a kernel function K(x, z) that corresponds to an inner product in some (often higher-dimensional) feature space φ(x)·φ(z). This allows the algorithm to learn non-linear decision boundaries without explicitly computing φ(x), avoiding the computational cost of transforming to high-dimensional spaces.


### Q9. Compare Linear Kernel, Polynomial Kernel, and RBF Kernel.

**Answer (Compare kernels):**

- **Linear kernel:** K(x,y) = x·y. Use when data are linearly separable or when number of features is large relative to samples (linear decision boundary). Fast and fewer hyperparameters.

- **Polynomial kernel:** K(x,y) = (γ x·y + r)^d. Allows polynomial decision boundaries. Degree d controls complexity; can capture interactions up to degree d. More hyperparameters (degree, γ, r).

- **RBF (Gaussian) kernel:** K(x,y) = exp(−γ||x − y||^2). Highly flexible, maps to infinite-dimensional space. γ controls kernel width. Good default when no prior on feature interactions.

Trade-offs: linear is simple/fast; polynomial can capture global non-linearities of certain degrees; RBF is flexible but risks overfitting if γ is large.


### Q10. What is the effect of the C parameter in SVM?

**Answer:**

`C` is the regularization parameter controlling the penalty for misclassification. High `C` values aim to minimize misclassification (small margin, possible overfitting). Low `C` values increase regularization, allowing more misclassified points but increasing margin (simpler model).


### Q11. What is the role of the Gamma parameter in RBF Kernel SVM?

**Answer:**

In RBF kernel SVM, γ (gamma) defines how much influence a single training example has. Low γ → far reach, smoother decision boundary. High γ → narrow reach, complex/wiggly boundary that may overfit. It effectively sets the kernel bandwidth.


### Q12. What is the Naïve Bayes classifier, and why is it called 'Naïve'?

**Answer:**

Naïve Bayes is a probabilistic classifier that applies Bayes' theorem with the simplifying assumption that features are conditionally independent given the class label. It's called 'naïve' because this independence assumption is often unrealistic in practice.


### Q13. What is Bayes’ Theorem?

**Answer (Bayes' Theorem):**

Bayes' theorem states:

P(C|X) = P(X|C) P(C) / P(X)

where P(C|X) is the posterior probability of class C given features X, P(X|C) is the likelihood, P(C) is the prior, and P(X) is evidence (normalizing constant).


### Q14. Explain the differences between Gaussian Naïve Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes.

**Answer (Variants):**

- **Gaussian Naive Bayes:** Assumes continuous features follow a Gaussian distribution within each class. Likelihood P(x_j|C) modeled as Normal(μ_{Cj}, σ_{Cj}^2).

- **Multinomial Naive Bayes:** Designed for count features (e.g., term frequencies). Likelihood is multinomial over term counts; common for document classification.

- **Bernoulli Naive Bayes:** Models binary/boolean features (presence/absence). Suitable when features are binary indicators of token presence.


### Q15. When should you use Gaussian Naïve Bayes over other variants?

**Answer:**

Use Gaussian NB when your features are continuous and reasonably approximated by Gaussian distributions (e.g., measurements, lab values). If features are counts or binary indicators, prefer Multinomial or Bernoulli respectively.


### Q16. What are the key assumptions made by Naïve Bayes?

**Answer (Key assumptions):**

- Conditional independence: features are independent given the class label.
- Feature distributions fit the assumed family (e.g., Gaussian for Gaussian NB).
- Class priors are known or estimated from training data.


### Q17. What are the advantages and disadvantages of Naïve Bayes?

**Answer (Advantages & Disadvantages):**

Advantages:
- Fast and memory-efficient.
- Performs well on high-dimensional data (like text).
- Requires relatively small training data.
- Simple to implement and interpret.

Disadvantages:
- Strong independence assumption often violated.
- Can give poor probability estimates.
- Less flexible than discriminative methods when feature interactions matter.


### Q18. Why is Naïve Bayes a good choice for text classification?

**Answer:**

Naïve Bayes suits text classification because text representations (bag-of-words) are high-dimensional and sparse; the independence assumption becomes less harmful. Multinomial NB models word counts directly and trains extremely fast, making it a strong baseline for NLP tasks.


### Q19. Compare SVM and Naïve Bayes for classification tasks.

**Answer (Compare SVM vs Naïve Bayes):**

- **SVM:** Discriminative; often higher classification accuracy on many tasks; can use kernels for complex boundaries; slower on very large feature sets if kernelized. Needs careful tuning (C, gamma).
- **Naïve Bayes:** Generative; extremely fast and robust with small data; works well for text; less tuned but may be outperformed by SVMs in many cases. NB gives simple probability outputs; SVM gives margins and can provide probability estimates via calibration.


### Q20. How does Laplace Smoothing help in Naïve Bayes?

**Answer (Laplace smoothing):**

Laplace smoothing (additive smoothing) adds a small constant (usually 1) to observed feature counts to avoid zero probabilities for unseen features in test data. For Multinomial NB, it prevents P(word|class)=0 when a word not present in training for that class appears in test.


## Part B — Practical (Code question-by-question)

Each practical question is followed by a runnable code cell (detailed). Run sequentially in Colab.

In [None]:
# Shared imports for practical tasks
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, StratifiedKFold
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, mean_squared_error, mean_absolute_error, roc_auc_score, log_loss, precision_recall_curve, average_precision_score, precision_score, recall_score, f1_score
print('Practical imports ready')


### P1. Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy

In [None]:
# P1: SVM Classifier on Iris dataset
from sklearn.svm import SVC
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Use all three classes
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel='rbf', probability=True, random_state=0))])
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)
print('Accuracy (Iris, 3-class):', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))


### P2. Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies

In [None]:
# P2: Linear vs RBF on Wine dataset
wine = datasets.load_wine()
Xw, yw = wine.data, wine.target
Xw_train, Xw_test, yw_train, yw_test = train_test_split(Xw, yw, test_size=0.2, random_state=1, stratify=yw)
pipe_lin = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel='linear', random_state=1))])
pipe_rbf = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel='rbf', random_state=1))])
pipe_lin.fit(Xw_train, yw_train)
pipe_rbf.fit(Xw_train, yw_train)
print('Linear SVM accuracy:', accuracy_score(yw_test, pipe_lin.predict(Xw_test)))
print('RBF SVM accuracy:   ', accuracy_score(yw_test, pipe_rbf.predict(Xw_test)))


### P3. Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean Squared Error (MSE)

In [None]:
# P3: SVR on California Housing
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
Xh, yh = housing.data, housing.target
Xh_train, Xh_test, yh_train, yh_test = train_test_split(Xh, yh, test_size=0.2, random_state=0)
pipe_svr = Pipeline([('scaler', StandardScaler()), ('svr', SVR(kernel='rbf'))])
pipe_svr.fit(Xh_train, yh_train)
y_pred = pipe_svr.predict(Xh_test)
print('SVR MSE:', mean_squared_error(yh_test, y_pred))


### P4. Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary

In [None]:
# P4: Polynomial kernel SVM decision boundary (2D toy data)
from sklearn.datasets import make_circles
Xc, yc = make_circles(noise=0.1, factor=0.2, random_state=1)
Xc_train, Xc_test, yc_train, yc_test = train_test_split(Xc, yc, test_size=0.25, random_state=1)
clf_poly = SVC(kernel='poly', degree=3, coef0=1, C=1.0, probability=True, random_state=0)
clf_poly.fit(Xc_train, yc_train)
# visualize
xx, yy = np.meshgrid(np.linspace(Xc[:,0].min()-0.5, Xc[:,0].max()+0.5, 300), np.linspace(Xc[:,1].min()-0.5, Xc[:,1].max()+0.5, 300))
Z = clf_poly.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.figure(figsize=(7,5))
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(Xc[:,0], Xc[:,1], c=yc, s=30, edgecolor='k')
plt.title('Polynomial Kernel SVM Decision Boundary')
plt.show()
print('Train acc:', clf_poly.score(Xc_train, yc_train), 'Test acc:', clf_poly.score(Xc_test, yc_test))


### P5. Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy

In [None]:
# P5: Gaussian Naive Bayes on Breast Cancer dataset
cancer = datasets.load_breast_cancer()
Xc, yc = cancer.data, cancer.target
Xct_train, Xct_test, yct_train, yct_test = train_test_split(Xc, yc, test_size=0.2, random_state=2, stratify=yc)
gnb = GaussianNB()
gnb.fit(Xct_train, yct_train)
yp = gnb.predict(Xct_test)
print('Gaussian NB accuracy (breast cancer):', accuracy_score(yct_test, yp))
print(classification_report(yct_test, yp))


### P6. Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset

In [None]:
# P6: Multinomial NB on 20 Newsgroups (subset for speed)
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.pipeline import make_pipeline
categories = ['alt.atheism', 'sci.space', 'comp.graphics', 'rec.sport.baseball']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories, remove=('headers','footers','quotes'))
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories, remove=('headers','footers','quotes'))
pipe_text = make_pipeline(CountVectorizer(), TfidfTransformer(), MultinomialNB())
pipe_text.fit(newsgroups_train.data, newsgroups_train.target)
pred = pipe_text.predict(newsgroups_test.data)
print('Multinomial NB accuracy (20 Newsgroups subset):', accuracy_score(newsgroups_test.target, pred))


### P7. Train an SVM Classifier with different C values and compare the decision boundaries visually

In [None]:
# P7: SVM with different C values and visualize decision boundary (2D toy)
from sklearn.datasets import make_classification
X2, y2 = make_classification(n_samples=200, n_features=2, n_redundant=0, n_clusters_per_class=1, random_state=4)
Xs_train, Xs_test, ys_train, ys_test = train_test_split(X2, y2, test_size=0.3, random_state=0)
fig, axes = plt.subplots(1, 3, figsize=(15,4))
Cs = [0.01, 1, 100]
for ax, C in zip(axes, Cs):
    clf = SVC(kernel='rbf', C=C, gamma='scale')
    clf.fit(Xs_train, ys_train)
    xx, yy = np.meshgrid(np.linspace(X2[:,0].min()-1, X2[:,0].max()+1, 300), np.linspace(X2[:,1].min()-1, X2[:,1].max()+1, 300))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
    ax.contourf(xx, yy, Z, alpha=0.3)
    ax.scatter(X2[:,0], X2[:,1], c=y2, edgecolor='k')
    ax.set_title(f'C={C} | Train acc={clf.score(Xs_train, ys_train):.2f} | Test acc={clf.score(Xs_test, ys_test):.2f}')
plt.show()


### P8. Train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with binary features

In [None]:
# P8: BernoulliNB on synthetic binary dataset
from sklearn.datasets import make_classification
Xb, yb = make_classification(n_samples=300, n_features=10, n_informative=5, n_redundant=0, random_state=0)
# binarize features to simulate presence/absence
Xb_bin = (Xb > np.median(Xb, axis=0)).astype(int)
Xb_train, Xb_test, yb_train, yb_test = train_test_split(Xb_bin, yb, test_size=0.2, random_state=0)
bnb = BernoulliNB()
bnb.fit(Xb_train, yb_train)
print('BernoulliNB accuracy (binary features):', accuracy_score(yb_test, bnb.predict(Xb_test)))
print(classification_report(yb_test, bnb.predict(Xb_test)))


### P9. Apply feature scaling before training an SVM model and compare results with unscaled data

In [None]:
# P9: Effect of feature scaling on SVM
wine = datasets.load_wine()
Xw, yw = wine.data, wine.target
Xw_train, Xw_test, yw_train, yw_test = train_test_split(Xw, yw, test_size=0.2, random_state=1, stratify=yw)
# Without scaling
clf_noscale = SVC(kernel='rbf', random_state=0)
clf_noscale.fit(Xw_train, yw_train)
acc_noscale = accuracy_score(yw_test, clf_noscale.predict(Xw_test))
# With scaling
clf_scale = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel='rbf', random_state=0))])
clf_scale.fit(Xw_train, yw_train)
acc_scale = accuracy_score(yw_test, clf_scale.predict(Xw_test))
print('Wine accuracy without scaling:', acc_noscale)
print('Wine accuracy with scaling:   ', acc_scale)


### P10. Train a Gaussian Naïve Bayes model and compare the predictions before and after Laplace Smoothing

In [None]:
# P10: GaussianNB predictions before and after Laplace smoothing (demonstration with MultinomialNB for counts)
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
docs = ['spam spam offer', 'spam cheap offer', 'hello how are you', 'hello friend']
y_docs = [1,1,0,0]
cv = CountVectorizer()
Xdocs = cv.fit_transform(docs)
# alpha small vs alpha=1
mnb_no = MultinomialNB(alpha=1e-9)
mnb_yes = MultinomialNB(alpha=1.0)
mnb_no.fit(Xdocs, y_docs)
mnb_yes.fit(Xdocs, y_docs)
print('Feature names:', cv.get_feature_names_out())
print('Probs without smoothing:', mnb_no.predict_proba(Xdocs))
print('Probs with smoothing   :', mnb_yes.predict_proba(Xdocs))


### P11. Train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C, gamma, kernel)

In [None]:
# P11: GridSearchCV for SVM hyperparameters
wine = datasets.load_wine()
Xw, yw = wine.data, wine.target
Xw_train, Xw_test, yw_train, yw_test = train_test_split(Xw, yw, test_size=0.2, random_state=1, stratify=yw)
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC(probability=True, random_state=0))])
param_grid = {'svc__C': [0.1, 1, 10], 'svc__gamma': ['scale', 0.1, 1], 'svc__kernel': ['rbf', 'linear']}
gs = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
gs.fit(Xw_train, yw_train)
print('Best params:', gs.best_params_)
print('Best CV score:', gs.best_score_)
print('Test accuracy with best estimator:', accuracy_score(yw_test, gs.best_estimator_.predict(Xw_test)))


### P12. Train an SVM Classifier on an imbalanced dataset and apply class weighting and check it improve accuracy

In [None]:
# P12: SVM on imbalanced dataset with class weighting
from sklearn.datasets import make_classification
Ximb, yimb = make_classification(n_samples=500, weights=[0.9,0.1], flip_y=0, random_state=0)
Ximb_train, Ximb_test, yimb_train, yimb_test = train_test_split(Ximb, yimb, test_size=0.2, random_state=0)
clf_unweighted = SVC(kernel='rbf', class_weight=None)
clf_weighted = SVC(kernel='rbf', class_weight='balanced')
clf_unweighted.fit(Ximb_train, yimb_train)
clf_weighted.fit(Ximb_train, yimb_train)
print('Unweighted recall (minority):', recall_score(yimb_test, clf_unweighted.predict(Ximb_test)))
print('Weighted recall (minority):', recall_score(yimb_test, clf_weighted.predict(Ximb_test)))
print('Unweighted accuracy:', accuracy_score(yimb_test, clf_unweighted.predict(Ximb_test)))
print('Weighted accuracy:', accuracy_score(yimb_test, clf_weighted.predict(Ximb_test)))


### P13. Implement a Naïve Bayes classifier for spam detection using email data (simple demo)

In [None]:
# P13: Simple Naive Bayes spam detection demo (toy data)
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
texts = ['win money now', 'cheap meds', 'hello how are you', 'meeting schedule', 'exclusive offer win']
labels = [1,1,0,0,1]
cv = CountVectorizer()
Xtxt = cv.fit_transform(texts)
Xtr, Xte, ytr, yte = train_test_split(Xtxt, labels, test_size=0.4, random_state=0)
mnb = MultinomialNB(alpha=1.0)
mnb.fit(Xtr, ytr)
print('Vocabulary:', cv.get_feature_names_out())
print('Test predictions:', mnb.predict(Xte))
print('Test probs:', mnb.predict_proba(Xte))
print('Accuracy:', accuracy_score(yte, mnb.predict(Xte)))


### P14. Train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and compare their accuracy

In [None]:
# P14: Compare SVM and Naive Bayes on the same dataset (Breast Cancer)
cancer = datasets.load_breast_cancer()
Xc, yc = cancer.data, cancer.target
Xtr, Xte, ytr, yte = train_test_split(Xc, yc, test_size=0.2, random_state=2, stratify=yc)
pipe_svm = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel='rbf', random_state=0))])
pipe_gnb = Pipeline([('scaler', StandardScaler()), ('gnb', GaussianNB())])
pipe_svm.fit(Xtr, ytr)
pipe_gnb.fit(Xtr, ytr)
print('SVM accuracy:', accuracy_score(yte, pipe_svm.predict(Xte)))
print('GaussianNB accuracy:', accuracy_score(yte, pipe_gnb.predict(Xte)))


### P15. Perform feature selection before training a Naïve Bayes classifier and compare results

In [None]:
# P15: Feature selection (Univariate) before Naive Bayes
from sklearn.feature_selection import SelectKBest, f_classif
cancer = datasets.load_breast_cancer()
Xc, yc = cancer.data, cancer.target
selector = SelectKBest(f_classif, k=10)
X_sel = selector.fit_transform(Xc, yc)
Xtr, Xte, ytr, yte = train_test_split(X_sel, yc, test_size=0.2, random_state=2, stratify=yc)
clf = GaussianNB()
clf.fit(Xtr, ytr)
print('Accuracy after selecting top-10 features:', accuracy_score(yte, clf.predict(Xte)))


### P16. Train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO) strategies on the Wine dataset and compare their accuracy

In [None]:
# P16: One-vs-Rest vs One-vs-One on Wine dataset
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
wine = datasets.load_wine()
Xw, yw = wine.data, wine.target
Xtr, Xte, ytr, yte = train_test_split(Xw, yw, test_size=0.2, random_state=1, stratify=yw)
ovr = OneVsRestClassifier(SVC(kernel='linear', probability=True))
ovo = OneVsOneClassifier(SVC(kernel='linear', probability=True))
ovr.fit(Xtr, ytr)
ovo.fit(Xtr, ytr)
print('OvR accuracy:', accuracy_score(yte, ovr.predict(Xte)))
print('OvO accuracy:', accuracy_score(yte, ovo.predict(Xte)))


### P17. Train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast Cancer dataset and compare their accuracy

In [None]:
# P17: Compare Linear, Poly, RBF SVM on Breast Cancer dataset
cancer = datasets.load_breast_cancer()
Xc, yc = cancer.data, cancer.target
Xtr, Xte, ytr, yte = train_test_split(Xc, yc, test_size=0.2, random_state=2, stratify=yc)
for kernel in ['linear', 'poly', 'rbf']:
    pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel=kernel, probability=True, random_state=0))])
    pipe.fit(Xtr, ytr)
    print(f'Kernel={kernel} | Test acc={accuracy_score(yte, pipe.predict(Xte)):.4f}')


### P18. Train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the average accuracy

In [None]:
# P18: Stratified K-Fold Cross-Validation average accuracy (SVM on Wine)
from sklearn.model_selection import StratifiedKFold, cross_val_score
wine = datasets.load_wine()
Xw, yw = wine.data, wine.target
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel='rbf', random_state=0))])
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
scores = cross_val_score(pipe, Xw, yw, cv=skf, scoring='accuracy')
print('CV accuracies:', scores)
print('Mean CV accuracy:', np.mean(scores))


### P19. Train a Naïve Bayes classifier using different prior probabilities and compare performance

In [None]:
# P19: Naive Bayes with different prior probabilities
from sklearn.naive_bayes import GaussianNB
cancer = datasets.load_breast_cancer()
Xc, yc = cancer.data, cancer.target
Xtr, Xte, ytr, yte = train_test_split(Xc, yc, test_size=0.2, random_state=2, stratify=yc)
# Default priors
gnb_default = GaussianNB()
gnb_default.fit(Xtr, ytr)
# Custom priors (e.g., favor class 0)
gnb_custom = GaussianNB(priors=[0.7, 0.3])
gnb_custom.fit(Xtr, ytr)
print('Default priors acc:', accuracy_score(yte, gnb_default.predict(Xte)))
print('Custom priors acc :', accuracy_score(yte, gnb_custom.predict(Xte)))


### P20. Perform Recursive Feature Elimination (RFE) before training an SVM Classifier and compare accuracy

In [None]:
# P20: Recursive Feature Elimination (RFE) before SVM
from sklearn.feature_selection import RFE
wine = datasets.load_wine()
Xw, yw = wine.data, wine.target
estimator = SVC(kernel='linear')
rfe = RFE(estimator, n_features_to_select=8)
Xr = rfe.fit_transform(Xw, yw)
Xtr, Xte, ytr, yte = train_test_split(Xr, yw, test_size=0.2, random_state=1, stratify=yw)
clf = SVC(kernel='rbf')
clf.fit(Xtr, ytr)
print('Accuracy after RFE (SVM):', accuracy_score(yte, clf.predict(Xte)))


### P21. Train an SVM Classifier and evaluate its performance using Precision, Recall, and F1-Score instead of accuracy

In [None]:
# P21: Evaluate SVM using Precision, Recall, F1
cancer = datasets.load_breast_cancer()
Xc, yc = cancer.data, cancer.target
Xtr, Xte, ytr, yte = train_test_split(Xc, yc, test_size=0.2, random_state=2, stratify=yc)
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel='rbf', random_state=0))])
pipe.fit(Xtr, ytr)
y_pred = pipe.predict(Xte)
print('Precision:', precision_score(yte, y_pred))
print('Recall   :', recall_score(yte, y_pred))
print('F1-score :', f1_score(yte, y_pred))
print('\nClassification report:\n', classification_report(yte, y_pred))


### P22. Train a Naïve Bayes Classifier and evaluate its performance using Log Loss (Cross-Entropy Loss)

In [None]:
# P22: Naive Bayes evaluated using Log Loss (Cross-Entropy)
from sklearn.naive_bayes import GaussianNB
cancer = datasets.load_breast_cancer()
Xc, yc = cancer.data, cancer.target
Xtr, Xte, ytr, yte = train_test_split(Xc, yc, test_size=0.2, random_state=2, stratify=yc)
clf = GaussianNB()
clf.fit(Xtr, ytr)
probs = clf.predict_proba(Xte)
print('Log loss:', log_loss(yte, probs))


### P23. Train an SVM Classifier and visualize the Confusion Matrix using seaborn

In [None]:
# P23: Confusion matrix visualization using seaborn (SVM on Iris)
iris = datasets.load_iris()
X, y = iris.data, iris.target
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel='rbf', random_state=0))])
pipe.fit(Xtr, ytr)
yp = pipe.predict(Xte)
cm = confusion_matrix(yte, yp)
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix (Iris)')
plt.show()


### P24. Train an SVM Regressor (SVR) and evaluate its performance using Mean Absolute Error (MAE) instead of MSE

In [None]:
# P24: SVR evaluated with MAE instead of MSE
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
Xh, yh = housing.data, housing.target
Xh_tr, Xh_te, yh_tr, yh_te = train_test_split(Xh, yh, test_size=0.2, random_state=0)
pipe = Pipeline([('scaler', StandardScaler()), ('svr', SVR(kernel='rbf'))])
pipe.fit(Xh_tr, yh_tr)
yp = pipe.predict(Xh_te)
print('SVR MAE:', mean_absolute_error(yh_te, yp))


### P25. Train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC score

In [None]:
# P25: Naive Bayes ROC-AUC (requires predict_proba)
cancer = datasets.load_breast_cancer()
Xc, yc = cancer.data, cancer.target
Xtr, Xte, ytr, yte = train_test_split(Xc, yc, test_size=0.2, random_state=2, stratify=yc)
clf = GaussianNB()
clf.fit(Xtr, ytr)
probs = clf.predict_proba(Xte)[:,1]
print('ROC-AUC:', roc_auc_score(yte, probs))


### P26. Train an SVM Classifier and visualize the Precision-Recall Curve

In [None]:
# P26: Precision-Recall Curve for SVM (Breast Cancer)
from sklearn.metrics import precision_recall_curve, average_precision_score
cancer = datasets.load_breast_cancer()
Xc, yc = cancer.data, cancer.target
Xtr, Xte, ytr, yte = train_test_split(Xc, yc, test_size=0.2, random_state=2, stratify=yc)
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel='rbf', probability=True, random_state=0))])
pipe.fit(Xtr, ytr)
probs = pipe.predict_proba(Xte)[:,1]
precision, recall, thresholds = precision_recall_curve(yte, probs)
avg_prec = average_precision_score(yte, probs)
plt.figure(figsize=(6,5))
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title(f'Precision-Recall curve (AP={avg_prec:.3f})')
plt.show()
