#**Question 1 : What is Information Gain, and how is it used in Decision Trees?**

Answer:-Information Gain measures how much uncertainty (impurity) is reduced after splitting the data using a feature.

A Decision Tree always tries to make the data as pure as possible, meaning

**How is it used in Decision Trees**

1. Calculate entropy of the parent node

2.  Split the data using each feature

3.  Calculate entropy after each split

4.  Compute Information Gain

5.  Choose the feature with the highest Information Gain

#**Question 2: What is the difference between Gini Impurity and Entropy?**
Hint: Directly compares the two main impurity measures, highlighting strengths,
weaknesses, and appropriate use cases.

Answer:-**Gini Impurity** measures the probability that a randomly chosen data point would be incorrectly classified if it were labeled according to the class distribution of the node. It is computationally faster because it does not involve logarithmic calculations. Gini Impurity is mainly used in the CART algorithm and works well for large datasets.

**Entropy** measures the amount of uncertainty or randomness in the data. It is based on information theory and uses logarithmic calculations. Entropy is more sensitive to changes in class probabilities and is used in algorithms like ID3 and C4.5 through Information Gain.

In practice, both measures often produce similar results. Gini Impurity is preferred when speed is important, while Entropy is chosen when a more informative and theoretically grounded measure is required.


#**Question 3:What is Pre-Pruning in Decision Trees?**
Answer:-Pre-Pruning is a technique used in Decision Trees to stop the tree from growing too deep during training. The main purpose of pre-pruning is to prevent overfitting and improve the model’s ability to generalize to new, unseen data.

In pre-pruning, the algorithm applies stopping criteria while building the tree, such as:

-  Maximum depth of the tree

-  Minimum number of samples required to split a node

-  Minimum Information Gain or Gini reduction

-  Maximum number of leaf nodes


#**Question 4:Write a Python program to train a Decision Tree Classifier using Gini Imurity as the criterion and print the feature importances (practical).**


In [None]:
from sklearn.tree import DecisionTreeClassifier
import numpy as np

# Sample dataset
# Features: [Age, Income]
X = np.array([
    [25, 40000],
    [30, 50000],
    [45, 80000],
    [35, 65000],
    [22, 30000],
    [50, 90000]
])

# Target labels (0 = No, 1 = Yes)
y = np.array([0, 0, 1, 1, 0, 1])

# Create Decision Tree model with Gini Impurity
model = DecisionTreeClassifier(criterion='gini', random_state=42)

# Train the model
model.fit(X, y)

# Print feature importances
print("Feature Importances:")
print(model.feature_importances_)


Feature Importances:
[1. 0.]


#**Question 5: What is a Support Vector Machine (SVM)?**
Answer:-A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. Its main objective is to find an optimal hyperplane that separates data points of different classes with the maximum margin.

SVM focuses on the data points that lie closest to the decision boundary, known as support vectors. These support vectors are the most important points because they directly influence the position and orientation of the hyperplane.

SVM can handle both linearly separable and non-linearly separable data. For non-linear data, it uses kernel functions (such as linear, polynomial, and RBF) to transform the data into a higher-dimensional space where separation becomes possible.


#**Question 6: What is the Kernel Trick in SVM?**
Answer:-The Kernel Trick is a technique used in Support Vector Machines (SVM) to handle non-linearly separable data. Instead of explicitly transforming the data into a higher-dimensional space, the kernel trick allows SVM to compute inner products in that space directly, making the computation efficient.

In simple terms, the kernel trick enables SVM to draw a non-linear decision boundary in the original feature space by implicitly mapping data to a higher dimension where the classes become linearly separable.

Commonly used kernel functions include:

-  Linear Kernel – used when data is linearly separable

-  Polynomial Kernel – captures polynomial relationships

-  RBF Kernel – handles complex, non-linear patterns

-  Sigmoid Kernel – similar to neural networks

#**Question 7: Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.**
Answer:-

In [None]:
# SVM with Linear and RBF kernels on Wine dataset

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)

# Calculate accuracies
linear_accuracy = accuracy_score(y_test, y_pred_linear)
rbf_accuracy = accuracy_score(y_test, y_pred_rbf)

# Print results
print("Linear Kernel Accuracy:", linear_accuracy)
print("RBF Kernel Accuracy:", rbf_accuracy)


Linear Kernel Accuracy: 0.9814814814814815
RBF Kernel Accuracy: 0.7592592592592593


#**Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?**
Answer:-The Naïve Bayes classifier is a supervised machine learning algorithm based on Bayes’ Theorem. It is mainly used for classification tasks, especially in text classification, spam detection, and sentiment analysis.

Naïve Bayes calculates the probability of a class given the input features and predicts the class with the highest posterior probability.

It is called “Naïve” because it makes a strong assumption that all features are conditionally independent of each other given the class label. In real-world data, this assumption is usually not true, but the algorithm still performs well in many cases.

**Key Points:**

-  Based on Bayes’ Theorem

-  Assumes feature independence

-  Simple, fast, and efficient

-  Works well with large datasets

#**Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes**
Answer:-**1.Gaussian Naïve Bayes**

-  **Features:** Continuous (numeric)

-  **Assumption:** Features follow normal (Gaussian) distribution

-  **Use Case:** Height, weight, blood pressure, etc.

**2.Multinomial Naïve Bayes**

-  **Features:** Discrete counts

-  **Assumption:** Works on frequency of features

-  **Use Case:** Text classification, spam detection (word counts)

**3.Bernoulli Naïve Bayes**

-  **Features:** Binary (0 or 1)

-  **Assumption:** Models presence or absence of features

-  **Use Case:** Spam detection, text with binary word occurrence

#**Question 10:Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.**
Answer:-


In [None]:
# Gaussian Naïve Bayes on Breast Cancer Dataset

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create Gaussian Naive Bayes model
gnb = GaussianNB()

# Train the model
gnb.fit(X_train, y_train)

# Predict on test set
y_pred = gnb.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print accuracy
print("Accuracy of Gaussian Naive Bayes:", accuracy)


Accuracy of Gaussian Naive Bayes: 0.9415204678362573
