<a href="https://colab.research.google.com/github/waquasadnankarimi/Function/blob/main/Welcome_To_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Question 1 : What is Information Gain, and how is it used in Decision Trees?

Answer-
-  Entropy: A measure of impurity or disorder in a dataset (high entropy = mixed classes, low entropy = pure classes).

-  Information Gain: The decrease in entropy achieved by splitting the data based on a specific attribute.

- It quantifies the relevance of a feature for classification: a higher IG means the feature is more useful for distinguishing between classes.

**How it's used in Decision Trees**
- **Calculate Initial Entropy**: Determine the entropy of the entire dataset (root node).
- **Evaluate Each Feature**: For every potential attribute (feature) to split on:
  -  Calculate the entropy of each subset created by the split.
  -  Calculate the weighted average entropy of the subsets.
  - Subtract the weighted average entropy from the initial entropy to get the Information Gain for that attribute.
- **Select the Best Split**: Choose the attribute with the highest Information Gain to be the node (or split) for that level of the tree.
- **Repeat**: Continue this process recursively for each branch (subset) until leaf nodes are pure (low/zero entropy), resulting in a tree that efficiently classifies data by asking the most informative questions first.

Question 2: What is the difference between Gini Impurity and Entropy?
Hint: Directly compares the two main impurity measures, highlighting strengths,
weaknesses, and appropriate use cases.

Answer
- Gini Impurity and Entropy are both metrics used to measure the impurity or disorder of a node in a decision tree, with the primary practical difference being computational efficiency and minor differences in how they select splits. Gini impurity is generally preferred due to its speed, while entropy is theoretically grounded in information theory.

| **Feature**             | **Gini Impurity**                                                     | **Entropy (Information Gain)**                                                           |
| ----------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| **Calculation**         | Uses squared probability terms: (1 - \sum p_i^2)                      | Uses logarithmic terms: (-\sum p_i \log_2(p_i))                                          |
| **Range (Binary Case)** | (0 \rightarrow 0.5) (0 = pure, 0.5 = maximally impure)                | (0 \rightarrow 1.0) (0 = pure, 1 = maximally impure)                                     |
| **Behavior**            | Favors splits that isolate the majority class (greedy)                | Produces more balanced splits and deeper trees                                           |
| **Computational Cost**  | Faster, no log computations                                           | Slightly slower due to ( \log ) operations                                               |
| **Origin / Usage**      | Core impurity measure in **CART** (Classification & Regression Trees) | Derived from **Information Theory** for measuring uncertainty; used in **ID3/C4.5/C5.0** |

**Strengths and Weaknesses**
- Gini Impurity Strengths:
  - Faster training: Its simpler calculation makes it the default choice when performance is critical, especially for large datasets.
  - Robustness: Gini can be more robust to noise in highly dimensional data.
  - Similar accuracy: In practice, it often yields very similar results to entropy.

- Gini Impurity Weaknesses:
  - Bias: It can be slightly biased toward selecting splits on features with a dominant class.
   - Less sensitive: It is less sensitive to small changes in class probabilities compared to entropy.
- Entropy Strengths:
  - Theoretically sound: It is a more robust measure of information gain (reduction in uncertainty).
  - Balanced splits: It may lead to more balanced and potentially deeper trees that capture subtle data structures.
  - Slightly better results: While results are often similar, some studies suggest it can offer slightly better performance in specific cases.

- Entropy Weaknesses:
  - Slower training: The logarithmic operations increase training time, which is a significant downside for large-scale applications.
  - Potential overfitting: Deeper, more complex trees may increase the risk of overfitting

Question 3:What is Pre-Pruning in Decision Trees?

Answe
- Pre-pruning in decision trees, also known as early stopping, stops the tree from growing too complex during its construction by setting limits (like max depth, minimum samples per leaf, or minimum information gain), preventing overfitting the training data and improving generalization to new data, unlike post-pruning, which trims a fully grown tree. It's a preventative measure to build smaller, simpler trees from the start, making it computationally more efficient than building a full tree first




In [1]:
'''
Question 4:Write a Python program to train a Decision Tree Classifier using Gini
Impurity as the criterion and print the feature importances (practical).
Hint: Use criterion='gini' in DecisionTreeClassifier and access .
feature_importances_.
(Include your Python code and output in the code box below.)
'''


from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

data = load_iris()
X = data.data
y = data.target

clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X, y)

feature_importances = pd.Series(clf.feature_importances_, index=data.feature_names)

print("Feature Importances:")
print(feature_importances)


Feature Importances:
sepal length (cm)    0.013333
sepal width (cm)     0.000000
petal length (cm)    0.564056
petal width (cm)     0.422611
dtype: float64


Question 5: What is a Support Vector Machine (SVM)?

Answer
- A Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for classification and regression tasks

Question 6: What is the Kernel Trick in SVM?

Answer
- The kernel trick in SVM is a powerful technique that allows Support Vector Machines to classify complex, non-linearly separable data by implicitly mapping it into a higher-dimensional space, where it becomes linearly separable, without the huge computational cost of actually performing the transformation



In [2]:
'''
Question 7: Write a Python program to train two SVM classifiers with Linear
and RBF kernels on the Wine dataset, then compare their accuracies.
Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy
scores after fitting on the same dataset.
(Include your Python code and output in the code box below.)
'''
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

data = load_wine()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)

svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)

y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)

acc_linear = accuracy_score(y_test, y_pred_linear)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

print(f"Linear Kernel Accuracy: {acc_linear:.4f}")
print(f"RBF Kernel Accuracy: {acc_rbf:.4f}")



Linear Kernel Accuracy: 1.0000
RBF Kernel Accuracy: 0.8056


Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"

Answer
- The Naïve Bayes classifier is a probabilistic machine learning algorithm based on Bayes’ Theorem that is commonly used for classification tasks, especially in text classification (e.g., spam detection, sentiment analysis)

**Why is it called "Naïve"?**
 - The "Naïve" (or simple/idiot) part comes from its core assumption: conditional independence of features.
 - It assumes that the presence of one feature (like the word "free") doesn't affect the presence of another feature (like the word "money") within the same class (spam), given the class itself.
 - This assumption simplifies the complex calculations needed to find the joint probability of all features, making the algorithm computationally efficient.
 - While this independence is unrealistic for most real data (features often correlate), the algorithm still performs surprisingly well in practice, as demonstrated in many applications like

Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes

Answer
- The core difference between the Naive Bayes variants lies in the assumptions they make about the distribution and type of the input features.

- Gaussian Naïve Bayes:
  - Data type: Used for continuous data, such as height, weight, or sensor measurements.
  - Distribution assumption: It assumes that the continuous features follow a Gaussian (normal) distribution.
- Multinomial Naïve Bayes:
  - Data type: Designed for discrete data that represents counts, such as the frequency of words in a document.
  - Distribution assumption: It assumes that the features are generated from a multinomial distribution.
  - Key feature: It considers the count or frequency of occurrence of a feature.
- Bernoulli Naïve Bayes:
  - Data type: Suited for binary or boolean features, where each feature is either present (1) or absent (0).
  - Distribution assumption: It uses the Bernoulli distribution.
  - Key feature: It only cares about the presence or absence of a feature, not its frequency.

| **Feature**                 | **Gaussian Naïve Bayes**                                   | **Multinomial Naïve Bayes**               | **Bernoulli Naïve Bayes**                     |
| --------------------------- | ---------------------------------------------------------- | ----------------------------------------- | --------------------------------------------- |
| **Data Type**               | Continuous numerical features                              | Discrete count data (integer frequencies) | Binary / Boolean values (0 or 1)              |
| **Distribution Assumption** | Gaussian (Normal distribution)                             | Multinomial distribution                  | Bernoulli distribution                        |
| **Typical Use Case**        | Predicting continuous attributes (e.g., height vs. weight) | Text classification based on word counts  | Spam detection based on word presence/absence |
| **Common Domain**           | Medical / sensor data                                      | NLP text classification                   | Binary feature tasks (e.g., sentiment flags)  |
| **Example Input**           | `[5.6, 73.2, 22.1]`                                        | `{word: count}`                           | `{word: present/not present}`                 |



In [3]:
'''
Question 10:
Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast
Cancer dataset and evaluate accuracy.
Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset
from sklearn.datasets.
(Include your Python code and output in the code box below.
'''
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

gnb = GaussianNB()
gnb.fit(X_train, y_train)

y_pred = gnb.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f"Gaussian Naïve Bayes Accuracy: {accuracy:.4f}")



Gaussian Naïve Bayes Accuracy: 0.9737
