# **SVM & Navie bayes**

**Questions 1:** What is Information Gain, and how is it used in Decision Trees?

**Answers:-** Information Gain is a metric used in decision trees to decide which feature to split on at each node. It measures how much uncertainty (entropy) is reduced when the dataset is split based on a particular attribute. The attribute with the highest Information Gain is chosen for the split, ensuring the tree becomes more efficient and accurate.




**Questions 2:** What is the difference between Gini Impurity and Entropy?

**Answes:-** Both Gini Impurity and Entropy are measures of node impurity used in decision trees. The key difference is that Entropy comes from information theory and measures disorder using logarithms, while Gini Impurity is simpler, measuring the probability of misclassification. Gini is computationally faster, while Entropy can lead to more balanced splits.





Questions 3: What is Pre-Pruning in Decision Trees?

**Answers:- **Pre-pruning (also called early stopping) is a technique in decision trees where the growth of the tree is stopped early—before it becomes fully grown—to prevent overfitting. It sets constraints (like maximum depth, minimum samples per node, or minimum information gain) so the tree doesn’t become too complex and generalizes better to unseen data.


**Questions 4:** Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).


In [None]:
#Answers:-

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names

clf = DecisionTreeClassifier(criterion="gini", random_state=42)
clf.fit(X, y)


importances = clf.feature_importances_

# Display results in a neat table
importance_df = pd.DataFrame({
    "Feature": feature_names,
    "Importance": importances
}).sort_values(by="Importance", ascending=False)

print("Feature Importances (using Gini Impurity):")
print(importance_df)




Feature Importances (using Gini Impurity):
             Feature  Importance
2  petal length (cm)    0.564056
3   petal width (cm)    0.422611
0  sepal length (cm)    0.013333
1   sepal width (cm)    0.000000


**Questions 5:-** What is a Support Vector Machine (SVM)?

**Answers:-** A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression. It works by finding the optimal boundary (called a hyperplane) that best separates different classes in the data, while maximizing the margin between them.


**Questions 6:-** What is the Kernel Trick in SVM?

**Answers :-** The Kernel Trick in SVM allows us to solve non-linear classification problems by implicitly mapping data into a higher-dimensional space without explicitly computing the transformation. It enables Support Vector Machines (SVMs) to find linear decision boundaries in complex datasets by using kernel functions instead of manual feature mapping.




Questions 7:- Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.

Answers :-

In [3]:
#Answers:-

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

wine = datasets.load_wine()
x, y = wine.data, wine.target

X_train, X_test, y_train, y_test = train_test_split(
    x, y, test_size=0.3, random_state=42, stratify=y
)


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
acc_linear = accuracy_score(y_test, y_pred_linear)



svm_rbf = SVC(kernel='rbf', gamma='scale', random_state=42)
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

#print

print("Accuracy with Linear Kernel: {:.2f}".format(acc_linear))
print("Accuracy with RBF Kernel: {:.2f}".format(acc_rbf))

#compare

if acc_linear > acc_rbf:
    print("Linear kernel performed better.")
elif acc_rbf > acc_linear:
    print("RBF kernel performed better.")
else:
    print("Both kernels performed equally well.")

Accuracy with Linear Kernel: 0.96
Accuracy with RBF Kernel: 0.98
RBF kernel performed better.


Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?

Answers:- The Naïve Bayes classifier is a simple yet powerful probabilistic machine learning algorithm based on Bayes’ Theorem. It is widely used for classification tasks such as spam detection, sentiment analysis, and text categorization.


Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve
Bayes, and Bernoulli Naïve Bayes

Answers:-
1. Gaussian Naïve Bayes
- Assumption: Features are continuous and follow a normal (Gaussian) distribution within each class.
- Use Case: Works well with datasets where features are real-valued (e.g., height, weight, sensor readings).

2. Multinomial Naïve Bayes
- Assumption: Features are discrete counts (non-negative integers).
- Use Case: Commonly used in text classification (spam detection, sentiment analysis) where features represent word counts or term frequencies.
- where N_{iy} is the count of feature i in class y.
- Example: Predicting whether an email is spam based on word frequency.

3. Bernoulli Naïve Bayes
- Assumption: Features are binary (0 or 1), indicating presence/absence of a feature.
- Use Case: Useful when only the existence of a feature matters, not its frequency.

- where p_y is the probability of feature presence in class y.
- Example: Classifying documents based on whether certain keywords appear at all.



**Question 10: Breast Cancer Dataset**

Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.


In [4]:
#Answers:-

# Import libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Standardize features (optional but often helpful)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predict on test set
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy of Gaussian Naïve Bayes on Breast Cancer dataset: {:.2f}".format(accuracy))

Accuracy of Gaussian Naïve Bayes on Breast Cancer dataset: 0.94
