# SVM and Naive bayes

# 1. What is Information Gain, and how is it used in Decision Trees?
 -  Information Gain is a metric used in Decision Tree algorithms to select the best feature for splitting the dataset.
It measures the reduction in uncertainty (entropy) after splitting the data on a particular feature.

Entropy represents the level of impurity or randomness in the dataset.
When a dataset is split using a feature, Information Gain calculates how much the entropy decreases.

Formula:
Information Gain(S, A) = Entropy(S) − Σ (|Sv| / |S|) × Entropy(Sv)

Where:
S = total dataset
A = attribute (feature)
Sv = subset of S after splitting on attribute A

In Decision Trees, the feature with the highest Information Gain is chosen for splitting at each node because it provides the most useful information for classification.


# Question 2: What is the difference between Gini Impurity and Entropy? Hint: Directly compares the two main impurity measures, highlighting strengths, weaknesses, and appropriate use cases.
-  Gini Impurity and Entropy are both impurity measures used in Decision Tree algorithms to evaluate the quality of a split.

Entropy measures the level of uncertainty or randomness in the dataset and is based on information theory.
It uses logarithmic calculations, which makes it computationally more expensive but more sensitive to changes in class probabilities.

Gini Impurity measures the probability of incorrect classification of a randomly chosen data point.
It is computationally faster than Entropy and is less sensitive to small probability changes.

Entropy is mainly used in ID3 and C4.5 algorithms, while Gini Impurity is used in the CART algorithm.
In practice, both give similar results, but Gini is preferred when speed is important, and Entropy is preferred when detailed information gain is required.


# Question 3:What is Pre-Pruning in Decision Trees?
-  Pre-Pruning is a technique used in Decision Trees to prevent overfitting by stopping the growth of the tree at an early stage.
Instead of allowing the tree to grow fully, certain conditions are applied before splitting a node.

Common pre-pruning criteria include setting a maximum tree depth, minimum number of samples required to split a node,
minimum information gain, or minimum samples in a leaf node.

By stopping unnecessary splits, pre-pruning reduces model complexity, improves generalization, and decreases training time.
However, excessive pre-pruning may lead to underfitting if the tree stops growing too early.


Question 4:Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical). Hint: Use criterion='gini' in DecisionTreeClassifier and access .feature_importances_. (Include your Python code and output in the code box below.)
-

In [1]:
# Import required libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Train Decision Tree Classifier using Gini Impurity
model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X, y)

# Print feature importances
print("Feature Importances:")
for feature, importance in zip(data.feature_names, model.feature_importances_):
    print(f"{feature}: {importance:.4f}")


Feature Importances:
sepal length (cm): 0.0133
sepal width (cm): 0.0000
petal length (cm): 0.5641
petal width (cm): 0.4226


#Question 5: What is a Support Vector Machine (SVM)?
-  A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks.
It works by finding an optimal decision boundary called a hyperplane that best separates the data points of different classes.

SVM aims to maximize the margin, which is the distance between the hyperplane and the nearest data points from each class,
known as support vectors. These support vectors play a key role in defining the position of the hyperplane.

SVM can handle both linear and non-linear data. For non-linear classification, it uses kernel functions such as linear,
polynomial, and radial basis function (RBF) to transform data into a higher-dimensional space.

Due to its ability to create robust decision boundaries, SVM is effective in high-dimensional spaces and is less prone
to overfitting when properly tuned.


# Question 6:  What is the Kernel Trick in SVM?
-   The Kernel Trick is a technique used in Support Vector Machines (SVM) to handle non-linear data.
It allows SVM to separate data that is not linearly separable by implicitly mapping it into a higher-dimensional feature space.

Instead of explicitly computing the transformation, the kernel trick uses kernel functions to calculate the inner products
between data points in the higher-dimensional space directly. This makes the computation efficient.

Common kernel functions include Linear, Polynomial, and Radial Basis Function (RBF).
By using an appropriate kernel, SVM can create a linear decision boundary in the transformed space that corresponds to a
non-linear boundary in the original space.

Thus, the kernel trick enables SVM to solve complex non-linear classification problems without increasing computational cost significantly.


# Question 7:  Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies. Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting on the same dataset. (Include your Python code and output in the code box below.)

In [2]:
# Import required libraries
from sklearn.datasets import load_wine
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
linear_accuracy = accuracy_score(y_test, y_pred_linear)

# Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
rbf_accuracy = accuracy_score(y_test, y_pred_rbf)

# Print accuracy results
print("Accuracy using Linear Kernel SVM:", linear_accuracy)
print("Accuracy using RBF Kernel SVM:", rbf_accuracy)


Accuracy using Linear Kernel SVM: 0.9814814814814815
Accuracy using RBF Kernel SVM: 0.7592592592592593


# Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?
- The Naïve Bayes classifier is a supervised machine learning algorithm based on Bayes’ Theorem.
It is mainly used for classification tasks such as text classification, spam detection, and sentiment analysis.

It is called “Naïve” because it assumes that all features are independent of each other given the class label.
This independence assumption is usually unrealistic in real-world data, but it simplifies the computation.

Despite this strong assumption, Naïve Bayes performs well in many practical applications, especially with large datasets
and high-dimensional data. It is fast, efficient, and works well even with limited training data.


# Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes ?
-  Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes are different variants of the Naïve Bayes classifier,
each designed for different types of data.

Gaussian Naïve Bayes is used when the features are continuous and are assumed to follow a normal (Gaussian) distribution.
It is commonly applied in problems involving numerical data such as measurements and sensor values.

Multinomial Naïve Bayes is used for discrete count-based features.
It is widely used in text classification problems where features represent word counts or term frequencies.

Bernoulli Naïve Bayes is used for binary features, where each feature represents the presence or absence of a characteristic.
It is suitable for binary text features such as whether a word appears in a document or not.

Thus, the main difference between these Naïve Bayes variants lies in the type of data they are designed to handle.


# Question 10:  Breast Cancer Dataset Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy. Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from sklearn.datasets. (Include your Python code and output in the code box below.)

In [3]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)

print("Accuracy of Gaussian Naïve Bayes on Breast Cancer dataset:", accuracy)


Accuracy of Gaussian Naïve Bayes on Breast Cancer dataset: 0.9415204678362573
