In [None]:
#1. What is a Decision Tree, and how does it work

#Ans. A **Decision Tree** is a supervised machine learning algorithm used for classification and regression tasks.
It is a tree-like model of decisions, where each node represents a feature (attribute), each branch represents a
decision rule, and each leaf node represents an outcome (class label or numerical value).

### **How It Works:**
1. **Splitting:**
   - The dataset is split into subsets based on feature values.
   - The algorithm selects the best feature to split the data using criteria like **Gini Impurity**,
   **Entropy (Information Gain)**, or **Mean Squared Error (MSE)** (for regression).

2. **Decision Nodes and Branches:**
   - At each node, the data is further split based on specific conditions.
   - This process continues recursively, creating branches.

3. **Leaf Nodes:**
   - The process stops when a stopping criterion is met (e.g., maximum depth, minimum samples per leaf).
   - Each leaf node represents a class label (classification) or a predicted value (regression).

4. **Pruning (Optional):**
   - To avoid overfitting, unnecessary branches are removed using **pre-pruning** (limiting depth, minimum samples per split)
   or **post-pruning** (removing branches after training).

### **Example of a Decision Tree:**
Imagine a decision tree that predicts if someone will buy a laptop based on their income and age:

```
           Income?
          /      \
     High        Low
      |           |
   Age < 30?     No
   /       \
 Yes       No
  |         |
Buy       Don't Buy
```

### **Advantages of Decision Trees:**
✔️ Easy to understand and interpret
✔️ Requires little data preprocessing
✔️ Can handle both numerical and categorical data

### **Disadvantages:**
❌ Prone to overfitting if not pruned properly
❌ Can be biased if data is imbalanced
❌ Sensitive to small changes in data (leading to different splits)


In [None]:
#2 What are impurity measures in Decision Trees

#Ans  ### **Impurity Measures in Decision Trees**
Impurity measures determine how "mixed" the data is at a given node in a Decision Tree.
 The goal is to minimize impurity when splitting nodes so that each branch becomes more homogeneous.

---

### **1. Gini Impurity (Used in CART Algorithm)**
**Formula:**
\[
Gini = 1 - \sum_{i=1}^{c} p_i^2
\]
Where:
- \( p_i \) is the probability of class \( i \) in the node
- \( c \) is the total number of classes

**Interpretation:**
- Gini = 0 → Pure node (all instances belong to one class)
- Gini = 0.5 → Maximum impurity (equal distribution of two classes)

🔹 **Example:** If a node contains 80% class A and 20% class B:
\[
Gini = 1 - (0.8^2 + 0.2^2) = 0.32
\]

---

### **2. Entropy (Used in ID3 & C4.5 Algorithm)**
**Formula:**
\[
Entropy = - \sum_{i=1}^{c} p_i \log_2 p_i
\]

**Interpretation:**
- Entropy = 0 → Pure node
- Entropy = 1 → Maximum impurity (equal class distribution)

🔹 **Example:** If a node contains 80% class A and 20% class B:
\[
Entropy = - (0.8 \log_2 0.8 + 0.2 \log_2 0.2) \approx 0.72
\]

---

### **3. Classification Error (Least Used)**
**Formula:**
\[
Error = 1 - \max(p_i)
\]

**Interpretation:**
- Measures misclassification rate
- Not as sensitive to small changes as Gini or Entropy

🔹 **Example:** If a node has 80% class A and 20% class B:
\[
Error = 1 - 0.8 = 0.2
\]

---

### **4. Mean Squared Error (MSE) for Regression Trees**
**Formula:**
\[
MSE = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y})^2
\]

- Measures the variance of target values at a node
- The goal is to minimize MSE when splitting

---

### **Which One to Use?**
- **Gini Impurity** is faster and preferred in **CART (Classification and Regression Trees)**.
- **Entropy** gives better splits but is computationally heavier.
- **MSE** is used for **regression trees**.



In [None]:
#3  What is the mathematical formula for Gini Impurity

#Ans  ### **Mathematical Formula for Gini Impurity**

The **Gini Impurity** measures the probability of incorrectly classifying a randomly chosen element
if it were randomly labeled according to the class distribution in a node.

#### **Formula:**
\[
Gini = 1 - \sum_{i=1}^{c} p_i^2
\]

Where:
- \( c \) = Total number of classes
- \( p_i \) = Proportion (probability) of class \( i \) in the node

#### **Alternative Representation:**
\[
Gini = \sum_{i=1}^{c} p_i (1 - p_i)
\]
(since \( 1 - p_i^2 = p_i(1 - p_i) + (1 - p_i)p_i \))

#### **Example Calculation:**
Consider a dataset where a node contains two classes:
- Class A: 80% (\( p_A = 0.8 \))
- Class B: 20% (\( p_B = 0.2 \))

\[
Gini = 1 - (0.8^2 + 0.2^2)
\]
\[
= 1 - (0.64 + 0.04)
\]
\[
= 1 - 0.68 = 0.32
\]

This means the impurity of this node is **0.32**, meaning it is still somewhat mixed but not entirely.



In [None]:
#4 What is the mathematical formula for Entropy

#Ans  ### **Mathematical Formula for Entropy**

Entropy is a measure of impurity or uncertainty in a dataset. It quantifies the randomness in the
 distribution of classes at a given node in a decision tree.

#### **Formula:**
\[
Entropy = - \sum_{i=1}^{c} p_i \log_2 p_i
\]

Where:
- \( c \) = Total number of classes
- \( p_i \) = Proportion (probability) of class \( i \) in the node
- \( \log_2 \) = Logarithm to base 2

---

### **Example Calculation:**
Suppose a node contains two classes:
- Class A: 80% (\( p_A = 0.8 \))
- Class B: 20% (\( p_B = 0.2 \))

\[
Entropy = - (0.8 \log_2 0.8 + 0.2 \log_2 0.2)
\]

Using logarithm values:
\[
\log_2 0.8 \approx -0.3219, \quad \log_2 0.2 \approx -2.3219
\]

\[
Entropy = - [ (0.8 \times -0.3219) + (0.2 \times -2.3219) ]
\]

\[
= - [ -0.2575 - 0.4644 ] = 0.722
\]

So, the entropy for this node is **0.722**, indicating some impurity.

---

### **Entropy Interpretation:**
- \( Entropy = 0 \) → Pure node (only one class present)
- \( Entropy = 1 \) → Maximum impurity (equal class distribution)
- Higher entropy means more disorder and greater impurity.


In [None]:
#5 What is Information Gain, and how is it used in Decision Trees

#Ans ### **Information Gain in Decision Trees**

**Information Gain (IG)** is a measure of how much uncertainty (entropy) is reduced after a dataset is
 split based on a feature. It helps in selecting the best feature to split a node in a Decision Tree.

---

### **Mathematical Formula for Information Gain**

\[
IG = Entropy(Parent) - \sum_{k=1}^{m} \frac{N_k}{N} \cdot Entropy(Child_k)
\]

Where:
- \( Entropy(Parent) \) = Entropy of the original dataset before the split
- \( m \) = Number of subsets after the split
- \( N_k \) = Number of samples in child node \( k \)
- \( N \) = Total number of samples in the parent node
- \( Entropy(Child_k) \) = Entropy of the \( k^{th} \) child node

---

### **How Information Gain is Used in Decision Trees**
1. **Calculate the Entropy of the Parent Node:**
   - Before splitting, measure how mixed the classes are in the dataset.

2. **Split the Dataset Based on a Feature:**
   - Divide the dataset into subsets using a feature (e.g., "Income: High/Low").

3. **Compute the Weighted Entropy of Child Nodes:**
   - Find the entropy of each subset and weight it by the proportion of samples in that subset.

4. **Compute Information Gain:**
   - Subtract the weighted child entropies from the parent entropy.

5. **Select the Best Feature for Splitting:**
   - The feature with the **highest Information Gain** is chosen for the split.

---

### **Example Calculation**
Consider a dataset where the **Parent Node** has two classes:
- **Class A:** 60% (\( p_A = 0.6 \))
- **Class B:** 40% (\( p_B = 0.4 \))

#### **Step 1: Compute Parent Entropy**
\[
Entropy_{parent} = - (0.6 \log_2 0.6 + 0.4 \log_2 0.4)
\]
\[
= - [0.6(-0.737) + 0.4(-1.322)]
\]
\[
= - [-0.442 + (-0.529)] = 0.971
\]

#### **Step 2: Split the Data Based on a Feature (e.g., "Age < 30?")**
- **Left Node (Yes: 4 samples → 3 A, 1 B)**
  \[
  Entropy_{left} = - \left( \frac{3}{4} \log_2 \frac{3}{4} + \frac{1}{4} \log_2 \frac{1}{4} \right) = 0.811
  \]
- **Right Node (No: 6 samples → 3 A, 3 B)**
  \[
  Entropy_{right} = - \left( \frac{3}{6} \log_2 \frac{3}{6} + \frac{3}{6} \log_2 \frac{3}{6} \right) = 1.0
  \]

#### **Step 3: Compute Weighted Entropy of Children**
\[
Entropy_{children} = \left( \frac{4}{10} \times 0.811 \right) + \left( \frac{6}{10} \times 1.0 \right)
\]
\[
= 0.3244 + 0.6 = 0.9244
\]

#### **Step 4: Compute Information Gain**
\[
IG = 0.971 - 0.9244 = 0.0466
\]

Since this IG is low, we might look for another feature with higher IG.

---

### **Key Insights**
- **Higher Information Gain → Better Split**
- If Information Gain is **0**, the feature does not help in classification.
- Decision Trees keep selecting the feature with the highest IG at each step.



In [None]:
#6 What is the difference between Gini Impurity and Entropy

#ans  ### **Difference Between Gini Impurity and Entropy**

Gini Impurity and Entropy are both impurity measures used in Decision Trees to determine the best feature for splitting.
 However, they differ in their calculations and interpretations.

| **Criteria**      | **Gini Impurity** | **Entropy** |
|------------------|----------------|------------|
| **Formula** | \[ Gini = 1 - \sum p_i^2 \] | \[ Entropy = -\sum p_i \log_2 p_i \] |
| **Range** | \( [0, 0.5] \) for binary classification | \( [0,1] \) for binary classification |
| **Meaning** | Probability of randomly misclassifying an instance | Measure of uncertainty or randomness |
| **Computational Complexity** | Faster (no logarithm calculation) | Slower (logarithm computation) |
| **Preference** | Used in **CART (Classification and Regression Trees)** | Used in **ID3 and C4.5 algorithms** |
| **Splitting Criterion** | Chooses the split that minimizes Gini Impurity | Chooses the split that maximizes Information Gain |
| **Bias in Splitting** | Prefers features with **more distinct classes** | Prefers balanced splits with **more uniform distributions** |

---

### **Example Comparison**
Suppose a node has two classes:
- **Class A:** 80% (\( p_A = 0.8 \))
- **Class B:** 20% (\( p_B = 0.2 \))

#### **Gini Impurity Calculation**
\[
Gini = 1 - (0.8^2 + 0.2^2) = 1 - (0.64 + 0.04) = 0.32
\]

#### **Entropy Calculation**
\[
Entropy = - (0.8 \log_2 0.8 + 0.2 \log_2 0.2)
\]
\[
= - (0.8 \times -0.3219 + 0.2 \times -2.3219)
\]
\[
= 0.722
\]

### **Key Takeaways**
- Gini is slightly faster to compute and preferred in **CART**.
- Entropy is more sensitive to class distribution and used in **ID3/C4.5**.
- In practice, both measures often lead to **similar tree structures**.


In [None]:
#7 What is the mathematical explanation behind Decision Trees

#Ans ### **Mathematical Explanation Behind Decision Trees**

A **Decision Tree** is a recursive partitioning algorithm that splits data into subsets based on feature
 values to minimize impurity. It works by selecting the best feature at each step, using impurity measures like
 **Gini Impurity**, **Entropy**, or **Mean Squared Error (MSE)** (for regression).

---

## **1. Splitting Criterion (Choosing the Best Feature)**
A feature is chosen based on how well it separates the data into pure groups. The common splitting criteria are:
- **Classification:** Information Gain (Entropy) or Gini Impurity
- **Regression:** Reduction in Variance (MSE)

---

## **2. Information Gain (Entropy-Based Splitting)**
Entropy measures uncertainty in a node:

\[
Entropy(S) = -\sum_{i=1}^{c} p_i \log_2 p_i
\]

where \( p_i \) is the probability of class \( i \) in node \( S \).

**Information Gain (IG) is the reduction in entropy after splitting:**
\[
IG = Entropy(Parent) - \sum_{k=1}^{m} \frac{N_k}{N} Entropy(Child_k)
\]

where:
- \( N_k \) is the number of samples in child \( k \)
- \( N \) is the total number of samples in the parent node

A feature with the **highest IG** is selected for splitting.

---

## **3. Gini Impurity-Based Splitting**
Instead of entropy, we can use **Gini Impurity**:

\[
Gini(S) = 1 - \sum_{i=1}^{c} p_i^2
\]

The feature with the **lowest Gini Impurity** is chosen.

---

## **4. Regression Tree (MSE-Based Splitting)**
For regression problems, Decision Trees minimize the variance of target values:

\[
MSE = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y})^2
\]

where \( \hat{y} \) is the mean of the target variable in a node.

**Reduction in Variance:**
\[
\Delta Variance = Variance(Parent) - \sum_{k=1}^{m} \frac{N_k}{N} Variance(Child_k)
\]

The feature that **reduces variance the most** is chosen for splitting.

---

## **5. Stopping Criteria (Tree Growth Control)**
A tree stops growing when:
- Maximum depth is reached.
- Minimum number of samples per node is met.
- Further splits do not significantly reduce impurity.

---

## **6. Pruning (Avoiding Overfitting)**
- **Pre-pruning:** Stop splitting early based on constraints (e.g., max depth).
- **Post-pruning:** Grow a full tree and then remove branches that don't improve performance.

---

### **Mathematical Example**
#### **Step 1: Compute Parent Entropy**
For a dataset with 60% Class A (\( p_A = 0.6 \)) and 40% Class B (\( p_B = 0.4 \)):

\[
Entropy_{parent} = - (0.6 \log_2 0.6 + 0.4 \log_2 0.4) = 0.971
\]

#### **Step 2: Compute Entropy After Split**
If splitting results in two child nodes:
- Left: 75% A, 25% B → \( Entropy_{left} = 0.811 \)
- Right: 50% A, 50% B → \( Entropy_{right} = 1.0 \)

#### **Step 3: Compute Weighted Entropy**
\[
Entropy_{children} = \frac{4}{10} \times 0.811 + \frac{6}{10} \times 1.0 = 0.924
\]

#### **Step 4: Compute Information Gain**
\[
IG = 0.971 - 0.924 = 0.047
\]

Since IG is low, we try another feature.

---

### **Conclusion**
Decision Trees iteratively select the feature that minimizes impurity or variance, split data accordingly,
 and stop when further splits are unnecessary.

In [None]:
#8 What is Pre-Pruning in Decision Trees

#Ans  ### **Pre-Pruning in Decision Trees**

**Pre-Pruning (Early Stopping)** is a technique used to prevent a Decision Tree from growing too deep and
overfitting the training data. It stops the tree **before** it becomes too complex by applying constraints during training.

---

## **How Pre-Pruning Works**

Instead of allowing the tree to grow until all leaves are pure (which may cause overfitting),
pre-pruning stops the growth based on one or more stopping conditions.

### **Common Stopping Conditions:**
1. **Maximum Depth (\( d_{max} \))**
   - Stops splitting when the tree reaches a certain depth.
   - Example: Stop splitting when depth = 5.

2. **Minimum Samples per Split (\( n_{min} \))**
   - Stops splitting if a node has fewer than a threshold number of samples.
   - Example: Stop if a node has < 10 samples.

3. **Minimum Information Gain (or Gini Reduction)**
   - Stops splitting if the improvement in Information Gain or Gini Impurity is below a threshold.
   - Example: Stop if Information Gain < 0.01.

4. **Maximum Number of Leaves (\( l_{max} \))**
   - Limits the number of leaf nodes to avoid over-complexity.
   - Example: Allow at most 20 leaf nodes.

5. **Maximum Impurity Decrease**
   - Stops splitting when the decrease in impurity (entropy or Gini) is below a threshold.
   - Example: If the impurity decrease is < 0.001, stop splitting.

---

## **Advantages of Pre-Pruning**
✅ **Prevents Overfitting** – Avoids learning noise by limiting tree complexity.
✅ **Improves Efficiency** – Reduces training time by stopping unnecessary splits.
✅ **Enhances Interpretability** – Produces smaller, more understandable trees.

---

## **Disadvantages of Pre-Pruning**
❌ **Risk of Underfitting** – The tree may stop growing too early and fail to capture important patterns.
❌ **Difficult to Set Thresholds** – Choosing the right stopping conditions requires tuning.

---

## **Example in Scikit-Learn (Python)**
```python
from sklearn.tree import DecisionTreeClassifier

# Decision Tree with Pre-Pruning
clf = DecisionTreeClassifier(max_depth=5, min_samples_split=10, min_samples_leaf=5)
clf.fit(X_train, y_train)
```
This tree:
- Stops growing at depth **5**.
- Splits a node **only if** it has **at least 10 samples**.
- Ensures that **leaf nodes** have at least **5 samples**.


In [None]:
#9 What is Post-Pruning in Decision Trees

#ans ### **Post-Pruning in Decision Trees**

**Post-Pruning (or Pruning after Training)** is a technique used to simplify an **overgrown**
 Decision Tree by removing unnecessary branches **after** it has been fully grown. This helps reduce **overfitting** and improves generalization.

---

## **How Post-Pruning Works**
1. **Grow a Fully Expanded Decision Tree**
   - The tree is allowed to grow **until all nodes are pure** or meet the stopping condition.

2. **Evaluate Each Subtree**
   - Remove branches that do **not significantly** improve accuracy.
   - The pruning is based on a validation dataset or a statistical test.

3. **Replace Pruned Nodes with Leaf Nodes**
   - The subtree is replaced with a single leaf node representing the most frequent class
    (for classification) or the mean value (for regression).

---

## **Methods of Post-Pruning**

### **1. Cost Complexity Pruning (CCP) – Used in CART Algorithm**
- Introduces a **pruning parameter \( \alpha \)** that controls complexity:
  \[
  R(T) = R(T_{orig}) + \alpha \times |T|
  \]
  Where:
  - \( R(T) \) = Total cost of the tree
  - \( R(T_{orig}) \) = Error before pruning
  - \( |T| \) = Number of leaf nodes
  - \( \alpha \) = Complexity parameter (higher values prune more aggressively)

- The best \( \alpha \) is chosen using **cross-validation**.

### **2. Reduced Error Pruning (REP) – Used in ID3 Algorithm**
- **Removes a subtree** if it does **not reduce error** on the validation set.
- If the accuracy **does not drop**, the branch is replaced by a leaf.
- Works best for **small datasets** but may not be as effective for large ones.

---

## **Advantages of Post-Pruning**
✅ **Prevents Overfitting** – Reduces tree complexity after training.
✅ **More Reliable than Pre-Pruning** – Uses actual data to prune instead of fixed constraints.
✅ **Improves Generalization** – Produces a model that works well on unseen data.

---

## **Disadvantages of Post-Pruning**
❌ **Computationally Expensive** – Requires growing a full tree first and then pruning it.
❌ **Validation Data Needed** – Needs extra data to check pruning effectiveness.

---

## **Example in Scikit-Learn (Python)**
```python
from sklearn.tree import DecisionTreeClassifier

# Train a fully grown tree
clf = DecisionTreeClassifier(ccp_alpha=0.01)  # Cost Complexity Pruning
clf.fit(X_train, y_train)
```
- **`ccp_alpha=0.01`** controls pruning strength.
- **Higher values** of `ccp_alpha` prune more aggressively.


In [None]:
#10. What is the difference between Pre-Pruning and Post-Pruning

#Ans  ### **Difference Between Pre-Pruning and Post-Pruning**

Pre-Pruning and Post-Pruning are techniques used in **Decision Trees** to prevent overfitting by controlling tree complexity.
 The key difference is **when** the pruning occurs.

| Feature        | **Pre-Pruning (Early Stopping)** | **Post-Pruning (Prune After Training)** |
|--------------|--------------------------------|--------------------------------|
| **When It Happens** | During tree construction | After the tree is fully grown |
| **How It Works** | Stops splitting early based on conditions like max depth, min samples per node, or minimum information gain |
| **Stopping Criteria** | - Maximum tree depth<br>- Minimum samples per split/leaf<br>- Minimum impurity reduction
| **Risk** | May **underfit** by stopping too early | May **overfit initially**, but pruning corrects it |
| **Computational Cost** | Faster (stops early) | Slower (tree is fully grown first) |
| **Common Algorithms** | Used in **CART, ID3, C4.5** | Used in **CART (Cost Complexity Pruning), ID3 (Reduced Error Pruning)** |
| **Example in Python (Scikit-Learn)** | `DecisionTreeClassifier(max_depth=5, min_samples_split=10)` |

---

### **Which One to Use?**
- **Pre-Pruning** is better when you need **faster training** and can tune stopping conditions.
- **Post-Pruning** is better when you want a **more optimized tree**.

In [None]:
#11 What is a Decision Tree Regressor	.

#Ans  ### **Decision Tree Regressor**

A **Decision Tree Regressor** is a type of **Decision Tree** used for **regression tasks**, where the
 target variable is continuous (numerical), rather than categorical (as in classification). Instead of
 predicting classes, it predicts **numeric values** by recursively splitting the data and computing the average of
  target values in each leaf node.

---

## **1. How Decision Tree Regression Works**
1. **Select the Best Feature to Split:**
   - Uses a **variance reduction** criterion instead of Gini or Entropy.
2. **Recursively Split Data:**
   - Creates branches where data is split to minimize error.
3. **Stopping Condition:**
   - Stops when a predefined depth is reached, or when further splitting does not significantly reduce error.
4. **Prediction:**
   - For a given input, it follows the decision rules down to a leaf node and returns the **mean target value** of that leaf.

---

## **2. Splitting Criterion: Mean Squared Error (MSE)**
Decision Tree Regression uses **MSE** to determine the best splits:

\[
MSE = \frac{1}{N} \sum_{i=1}^{N} (y_i - \bar{y})^2
\]

where:
- \( y_i \) are actual values,
- \( \bar{y} \) is the mean of target values in that node,
- \( N \) is the number of samples in the node.

The **feature that minimizes the weighted MSE** across child nodes is chosen for splitting.

---

## **3. Example in Python (Scikit-Learn)**
```python
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Sample dataset
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([2.5, 3.0, 3.7, 4.5, 5.1, 5.9, 6.8, 7.4, 8.0, 9.2])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Regressor
regressor = DecisionTreeRegressor(max_depth=3)
regressor.fit(X_train, y_train)

# Predict
y_pred = regressor.predict(X_test)

# Evaluate
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
```

---

## **4. Advantages of Decision Tree Regressor**
✅ **Handles Non-Linear Data** – Works well when relationships are non-linear.
✅ **Easy to Interpret** – The tree structure is readable.
✅ **Handles Missing Data** – Can work with missing values.

## **5. Disadvantages**
❌ **Prone to Overfitting** – Needs pruning or constraints.
❌ **Not Smooth Predictions** – Step-like predictions instead of continuous curves.
❌ **Sensitive to Small Changes in Data** – Small variations can lead to a different tree structure.

---

### **Conclusion**
- If your target variable is **continuous**, use a **Decision Tree Regressor**.
- It splits data using **variance reduction (MSE)** instead of entropy/Gini.
- To prevent overfitting, use **max_depth, min_samples_split, or post-pruning**.


In [None]:
#12 What are the advantages and disadvantages of Decision Trees

#Ans  ### **Advantages and Disadvantages of Decision Trees**

Decision Trees are a popular machine learning model due to their simplicity and interpretability.
 However, they also have some limitations.

---

## **Advantages of Decision Trees** ✅

### **1. Easy to Understand and Interpret**
- The tree structure makes it intuitive and **visually interpretable**.
- Even non-technical users can follow the decision-making process.

### **2. Handles Both Classification and Regression**
- Works for both **categorical** (classification) and **numerical** (regression) problems.

### **3. No Need for Feature Scaling**
- Unlike models like **SVM** or **Logistic Regression**, Decision Trees do **not** require normalization or standardization.

### **4. Handles Non-Linear Data Well**
- Can capture **complex relationships** and interactions between features.
- Works well for **non-linear** datasets.

### **5. Can Handle Missing Values**
- Decision Trees **can work** with missing data by using **surrogate splits**.

### **6. Works with Both Small and Large Datasets**
- Can be used effectively on **small datasets** where other models might overcomplicate things.

### **7. Feature Selection is Automatic**
- The algorithm **automatically selects the most important features** for splitting.

### **8. Can Handle Multi-Class Classification**
- Unlike logistic regression (which is inherently binary), Decision Trees can classify into multiple categories directly.

---

## **Disadvantages of Decision Trees** ❌

### **1. Prone to Overfitting**
- If **fully grown**, the tree **memorizes training data**, leading to poor generalization.
- **Solution:** Use **pre-pruning** (max depth, min samples per split) or **post-pruning** (Cost Complexity Pruning).

### **2. High Variance (Unstable Predictions)**
- Small changes in the data can lead to a **completely different tree structure**.
- **Solution:** Use **ensemble methods** like **Random Forest** or **Gradient Boosting**.

### **3. Not Ideal for Continuous Data Prediction**
- In **regression tasks**, the predictions are **step-like** instead of smooth.
- **Solution:** Use **Random Forest Regressor** or **pruning**.

### **4. Biased Splitting (Favours Dominant Features)**
- If a feature has **more unique values**, the tree may **prefer it** for splitting (even if it’s not the most important).
- **Solution:** Use **feature selection techniques**.

### **5. Computational Cost for Deep Trees**
- If the dataset is large and the tree is deep, training can be slow.
- **Solution:** Use **Random Forest** (which limits tree depth) or **Gradient Boosting**.

---

## **Summary Table: Pros & Cons**

| **Feature**               | **Advantage** ✅  | **Disadvantage** ❌  |
|----------------------|------------------|------------------|
| **Interpretability** | Easy to understand | Can be too complex when deep |
| **Feature Scaling** | Not needed | - |
| **Handling Non-Linearity** | Works well | Overfits without pruning |
| **Overfitting** | - | Prone to overfitting |
| **Computation Time** | Fast for small data | Slow for deep trees |
| **Data Sensitivity** | Captures interactions well | High variance (unstable predictions) |
| **Handling Missing Data** | Can handle missing values | - |
| **Continuous Data Prediction** | Works, but step-like | Not smooth |

---

### **When to Use Decision Trees?**
✅ When **interpretability** is important.
✅ When the dataset has **non-linear relationships**.
✅ When you need **feature selection automatically**.
✅ When working with **small to medium-sized datasets**.

🚫 Avoid Decision Trees if:
❌ You need a **highly stable model** → Use **Random Forest**.
❌ You have **very large datasets** → Use **Gradient Boosting**.
❌ You need smooth predictions for regression → Use **Linear Regression or Neural Networks**.

---



In [None]:
#13 How does a Decision Tree handle missing values

#ANs   ### **How Decision Trees Handle Missing Values**

Decision Trees can handle missing values in both **features (input data)** and **target variables (output data)** in several ways.

---

## **1. Handling Missing Values in Features (Input Data)**
When some feature values are missing, Decision Trees handle them in the following ways:

### **A. Ignoring Missing Values (Default in Scikit-Learn)**
- If a row has a missing value in a feature, **it is ignored for that particular split**.
- However, the row is still used for splits on other features.

### **B. Surrogate Splitting (Used in CART Algorithm)**
- If a feature has missing values, the algorithm finds an **alternative (surrogate) feature** that best mimics
the original feature’s split.
- The missing values are then assigned based on this surrogate feature.

### **C. Assigning to the Most Common Split (Mode or Mean Imputation)**
- For **categorical features**, missing values are assigned to the **most frequent category**.
- For **numerical features**, missing values are assigned to the **mean or median** of the feature.

### **D. Using "Missing" as a Separate Category (for Categorical Features)**
- If a categorical feature has missing values, some implementations create a new category called **"Missing"**
and treat it as a separate class.

---

## **2. Handling Missing Values in Target Variables (Output Data)**
- If the target variable (**Y**) has missing values, those rows are usually **dropped** during training.
- Alternatively, missing target values can be imputed using techniques like **mean imputation** (for regression)
 or **mode imputation** (for classification).

---

## **Example in Scikit-Learn**
```python
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split

# Create dataset with missing values
data = pd.DataFrame({
    'Feature1': [1, 2, np.nan, 4, 5],
    'Feature2': [3, np.nan, 1, 2, 5],
    'Target': [0, 1, 0, 1, 0]
})

# Split into X (features) and y (target)
X = data[['Feature1', 'Feature2']]
y = data['Target']

# Impute missing values using mean strategy
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)

# Train Decision Tree
clf = DecisionTreeClassifier()
clf.fit(X_imputed, y)
```
**Here’s what happens:**
- We replace missing values with the **mean** before training the model.
- The Decision Tree then learns from the cleaned dataset.

---

## **Advantages of Decision Trees in Handling Missing Data**
✅ **Can work with missing values without imputation** (e.g., surrogate splits).
✅ **Does not require feature scaling**, making it simpler to use.
✅ **Robust to missing values**, unlike algorithms like SVM or Logistic Regression.

## **Disadvantages**
❌ **Surrogate splits may not always be reliable**, especially in small datasets.
❌ **Imputation may introduce bias**, especially if missing data is **not random**.

---

### **When to Use Different Methods?**
- **If missing values are few** → Use **mean/median imputation**.
- **If many missing values exist** → Use **surrogate splitting** (available in some libraries).
- **For categorical features** → Treat "missing" as a separate category.


In [None]:
#14 * How does a Decision Tree handle categorical features

#Ans  ### **How Decision Trees Handle Categorical Features**

Decision Trees can **naturally handle categorical features** without needing to convert them into
numerical values explicitly. However, different algorithms and implementations handle categorical features differently.

---

## **1. Methods for Handling Categorical Features**

### **A. Using One-Hot Encoding (Most Common in Scikit-Learn)**
- Converts each category into a separate binary column (0 or 1).
- Works well when there are **few categories** but can cause high-dimensional data for many categories.

#### **Example: One-Hot Encoding in Scikit-Learn**
```python
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import OneHotEncoder
import numpy as np

# Example dataset
X = np.array([['Red'], ['Blue'], ['Green'], ['Red'], ['Green']])
y = np.array([0, 1, 0, 1, 0])

# Apply One-Hot Encoding
encoder = OneHotEncoder(sparse=False)
X_encoded = encoder.fit_transform(X)

# Train Decision Tree
clf = DecisionTreeClassifier()
clf.fit(X_encoded, y)
```
✅ **Pros:** Works well for small categorical feature sets.
❌ **Cons:** Can lead to the **curse of dimensionality** if too many categories exist.

---

### **B. Using Label Encoding (For Ordered Categories)**
- Assigns a unique number to each category.
- **Works best when categories have a meaningful order** (e.g., **Low = 0, Medium = 1, High = 2**).
- If there’s **no natural order**, label encoding can mislead the tree.

#### **Example: Label Encoding in Scikit-Learn**
```python
from sklearn.preprocessing import LabelEncoder

# Example categorical feature
X = np.array([['Red'], ['Blue'], ['Green'], ['Red'], ['Green']])

# Apply Label Encoding
encoder = LabelEncoder()
X_encoded = encoder.fit_transform(X)

print(X_encoded)  # Output: [2, 0, 1, 2, 1]
```
✅ **Pros:** Simple and works for ordinal categories.
❌ **Cons:** Can mislead the model if categories have no order.

---

### **C. Using Decision Trees That Natively Handle Categorical Data (e.g., XGBoost, LightGBM, H2O,
 and Scikit-Learn's `DecisionTreeClassifier` with `dtype="category"`)**
- Some libraries can **directly handle categorical features** without encoding.
- **LightGBM** and **H2O.ai** are optimized for categorical variables.

#### **Example: LightGBM Handling Categorical Features Natively**
```python
import lightgbm as lgb
import pandas as pd

# Example categorical dataset
df = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Red', 'Green'], 'Target': [0, 1, 0, 1, 0]})
df['Color'] = df['Color'].astype('category')  # Convert to categorical type

# Train LightGBM Model
model = lgb.LGBMClassifier()
model.fit(df[['Color']], df['Target'])
```
✅ **Pros:** Faster and more efficient than one-hot encoding.
❌ **Cons:** Not available in basic Decision Tree implementations like Scikit-Learn.

---

## **2. How Splitting Works on Categorical Features**

- **For binary splits (CART Algorithm, used in Scikit-Learn)**:
  - The tree groups categories into **two subsets** (e.g., `{Red, Green}` vs. `{Blue}`) and selects the best split.

- **For multi-way splits (ID3, C4.5, and CHAID Algorithms)**:
  - The tree creates a separate branch for **each category**.

---

## **3. Best Practices for Handling Categorical Features in Decision Trees**
| **Scenario** | **Best Encoding Method** |
|-------------|-------------------------|
| Few categories (< 10) | One-Hot Encoding |
| Many categories (> 10) | Use Label Encoding or LightGBM |
| Ordinal categories (Low, Medium, High) | Label Encoding |
| Large datasets with categorical data | Use LightGBM or H2O.ai (native support) |

---

### **Conclusion**
- **Scikit-Learn’s Decision Trees require encoding (One-Hot or Label Encoding).**
- **Advanced libraries (LightGBM, H2O) handle categorical data natively** for better performance.
- **Choose encoding wisely:** One-Hot for small categories, Label Encoding for ordered categories, and native handling for large datasets.


In [None]:
#15   What are some real-world applications of Decision Trees?

#Ans  ### **Real-World Applications of Decision Trees** 🌍🌳

Decision Trees are widely used in various fields due to their **interpretability**, **efficiency**,
 and ability to handle both **classification** and **regression** problems. Here are some key real-world applications:

---

## **1. Healthcare & Medical Diagnosis 🏥**
🔹 **Disease Prediction & Diagnosis**
   - Used to classify **patients as high-risk or low-risk** for diseases (e.g., diabetes, heart disease, cancer).
   - Example: A Decision Tree can analyze **symptoms, test results, and medical history** to diagnose a disease.

🔹 **Treatment Recommendation**
   - Helps doctors **choose the best treatment** based on a patient's symptoms and history.

🔹 **Predicting Patient Readmission**
   - Hospitals use Decision Trees to predict **which patients are likely to be readmitted** based on past records.

---

## **2. Banking & Finance 💰**
🔹 **Credit Risk Assessment**
   - Banks use Decision Trees to **approve or reject loan applications** based on income, credit score, and financial history.

🔹 **Fraud Detection**
   - Identifies fraudulent transactions by analyzing patterns in **transaction history, location, and spending behavior**.

🔹 **Stock Market Prediction**
   - Used in algorithmic trading to decide **buy/sell strategies** based on market indicators.

---

## **3. E-Commerce & Retail 🛒**
🔹 **Customer Segmentation & Recommendation Systems**
   - Helps classify customers based on purchasing behavior to recommend **personalized products**.
   - Example: Amazon uses Decision Trees to **suggest products** based on a customer’s purchase history.

🔹 **Churn Prediction**
   - Identifies customers likely to stop using a service (churn) based on past interactions and purchase history.
   - Example: Subscription services (Netflix, Spotify) use Decision Trees to predict **which users might cancel** their subscriptions.

🔹 **Pricing Optimization**
   - Helps businesses decide **optimal prices** based on demand, competition, and seasonal trends.

---

## **4. Manufacturing & Quality Control 🏭**
🔹 **Defect Detection**
   - Identifies defective products based on sensor data and quality checks.

🔹 **Supply Chain Optimization**
   - Predicts delays and optimizes logistics by analyzing past supply chain data.

---

## **5. Education & Student Performance 📚**
🔹 **Predicting Student Performance**
   - Schools use Decision Trees to predict **which students are at risk of failing** based on attendance, test scores, and study habits.

🔹 **Personalized Learning Plans**
   - Adaptive learning platforms use Decision Trees to recommend **custom study materials** based on student weaknesses.

---

## **6. Marketing & Advertising 📢**
🔹 **Targeted Advertising**
   - Helps classify customers into different **buyer personas** for personalized marketing campaigns.

🔹 **Lead Scoring**
   - Companies use Decision Trees to predict **which leads are most likely to convert** into customers.

🔹 **Email Marketing Optimization**
   - Predicts **which email campaigns** will have higher engagement based on user behavior.

---

## **7. Human Resources & Employee Management 👥**
🔹 **Employee Attrition Prediction**
   - Predicts which employees are likely to leave the company based on work satisfaction, salary, and tenure.

🔹 **Hiring Decisions**
   - Helps HR teams shortlist candidates based on skills, experience, and interview performance.

---

## **8. Energy & Utilities ⚡**
🔹 **Energy Consumption Prediction**
   - Predicts **electricity or water consumption** for better resource management.

🔹 **Fault Detection in Power Grids**
   - Helps detect anomalies and **prevent power outages** by analyzing grid performance data.

---

## **9. Transportation & Logistics 🚚**
🔹 **Route Optimization**
   - Used by logistics companies (FedEx, UPS) to determine the **fastest delivery routes** based on traffic, weather, and past data.

🔹 **Predicting Flight Delays**
   - Airlines use Decision Trees to predict **flight delays** based on weather conditions and past schedules.

---

## **10. Agriculture & Environmental Science 🌱**
🔹 **Crop Disease Detection**
   - Uses Decision Trees to classify crops based on images and identify **diseases** early.

🔹 **Weather Forecasting**
   - Helps predict **rainfall, temperature trends, and natural disasters**.

🔹 **Soil Quality Assessment**
   - Analyzes soil samples to recommend **best crops** for farming.

---

### **Conclusion: Why Use Decision Trees?**
✅ **Easy to interpret** – Clear decision-making process.
✅ **Handles non-linear data well** – Can classify complex patterns.
✅ **Works with missing values** – Can still make predictions with incomplete data.
✅ **Versatile** – Used in **finance, healthcare, e-commerce, marketing, and many more fields**.



In [None]:
                                                  #PRACTICAL

In [None]:
#16 Write a Python program to train a Decision Tree Classifier on the Iris dataset and print the model accuracy*

#ans Here’s a Python program to train a **Decision Tree Classifier** on the **Iris dataset**
 and print the **model accuracy**. 🚀

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** using `sklearn.datasets.load_iris`.
2️⃣ Split the dataset into **training** and **testing** sets.
3️⃣ Train a **Decision Tree Classifier** using `sklearn.tree.DecisionTreeClassifier`.
4️⃣ Predict the labels on the test set and **calculate accuracy**.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features (Sepal length, Sepal width, Petal length, Petal width)
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)  # Create model
clf.fit(X_train, y_train)  # Train model

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")  # Print accuracy
```

---

### **Expected Output:**
🔹 The accuracy should be **above 90%**, typically around **0.93 - 1.00**, since the **Iris dataset is simple**.

```
Model Accuracy: 0.97
```

---

### **Enhancements You Can Try:**
✅ **Tune Hyperparameters** (e.g., `max_depth`, `min_samples_split`).
✅ **Visualize the Decision Tree** using `plot_tree()`.
✅ **Use Cross-Validation** (`cross_val_score`) for more reliable accuracy.


In [None]:
#17 Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the
feature importances*

#Ans  Here’s a **Python program** to train a **Decision Tree Classifier** using **Gini Impurity** as the criterion
 and print the **feature importances**. 🚀

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** using `sklearn.datasets.load_iris`.
2️⃣ Split the dataset into **training** and **testing** sets.
3️⃣ Train a **Decision Tree Classifier** with `criterion="gini"`.
4️⃣ Print **feature importances** to see which features influence the model the most.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features (Sepal length, Sepal width, Petal length, Petal width)
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Decision Tree Classifier with Gini Impurity
clf = DecisionTreeClassifier(criterion="gini", random_state=42)  # Use "gini" as the criterion
clf.fit(X_train, y_train)  # Train model

# Print feature importances
print("Feature Importances:")
for feature, importance in zip(iris.feature_names, clf.feature_importances_):
    print(f"{feature}: {importance:.4f}")
```

---

### **Expected Output:**
🔹 The feature importances indicate how much each feature contributes to the Decision Tree’s splits.

Example output:
```
Feature Importances:
sepal length (cm): 0.01
sepal width (cm): 0.00
petal length (cm): 0.56
petal width (cm): 0.43
```
🚀 **Petal length & width** are usually the most important features for classifying **Iris species**.

---

### **Enhancements You Can Try:**
✅ **Change the `criterion` to `"entropy"`** and compare results.
✅ **Plot the Decision Tree** using `plot_tree()`.
✅ **Tune Hyperparameters** like `max_depth` for better performance.


In [None]:
#18  Write a Python program to train a Decision Tree Classifier using Entropy as the splitting criterion and print the
model accuracy*

#Ans   Here’s a **Python program** to train a **Decision Tree Classifier** using **Entropy** as the splitting
 criterion and print the **model accuracy**. 🚀

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** using `sklearn.datasets.load_iris`.
2️⃣ Split the dataset into **training** and **testing** sets.
3️⃣ Train a **Decision Tree Classifier** with `criterion="entropy"`.
4️⃣ Make predictions and compute **model accuracy**.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features (Sepal length, Sepal width, Petal length, Petal width)
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Decision Tree Classifier using Entropy as the criterion
clf = DecisionTreeClassifier(criterion="entropy", random_state=42)  # Use "entropy" as the splitting criterion
clf.fit(X_train, y_train)  # Train model

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")  # Print accuracy
```

---

### **Expected Output:**
🔹 The model should achieve **high accuracy (~95-100%)** on the **Iris dataset**, as it's a well-separated dataset.

```
Model Accuracy: 0.97
```

---

### **Enhancements You Can Try:**
✅ **Compare with Gini Impurity (`criterion="gini"`)** to see performance differences.
✅ **Visualize the Decision Tree** using `plot_tree()`.
✅ **Use Cross-Validation** (`cross_val_score`) for more reliable accuracy.



In [None]:
#19  Write a Python program to train a Decision Tree Regressor on a housing dataset and evaluate using Mean
Squared Error (MSE)*

#Ans   Here’s a **Python program** to train a **Decision Tree Regressor** on the **California Housing dataset**
and evaluate it using **Mean Squared Error (MSE)**. 🚀

---

### **Steps in the Code:**
1️⃣ Load the **California Housing dataset** using `sklearn.datasets.fetch_california_housing`.
2️⃣ Split the dataset into **training** and **testing** sets.
3️⃣ Train a **Decision Tree Regressor** using `sklearn.tree.DecisionTreeRegressor`.
4️⃣ Predict house prices and compute **Mean Squared Error (MSE)**.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
housing = fetch_california_housing()
X = housing.data    # Features (e.g., median income, total rooms, population, etc.)
y = housing.target  # Target (median house price)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Decision Tree Regressor
regressor = DecisionTreeRegressor(random_state=42)  # Create model
regressor.fit(X_train, y_train)  # Train model

# Make predictions on the test set
y_pred = regressor.predict(X_test)

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.4f}")  # Print MSE
```

---

### **Expected Output:**
🔹 The **MSE value** will depend on the **depth of the tree**, but typically falls between **0.2 - 0.8** for this dataset.

Example output:
```
Mean Squared Error (MSE): 0.4123
```

---

### **Enhancements You Can Try:**
✅ **Tune Hyperparameters** (`max_depth`, `min_samples_split`) for better performance.
✅ **Compare with Linear Regression** to see which performs better.
✅ **Use Feature Importance (`feature_importances_`)** to analyze key factors.


In [None]:
#20 Write a Python program to train a Decision Tree Classifier and visualize the tree using graphviz

#ans   Here’s a **Python program** to train a **Decision Tree Classifier** and visualize the tree using **Graphviz**. 🚀

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** using `sklearn.datasets.load_iris`.
2️⃣ Train a **Decision Tree Classifier**.
3️⃣ Export the tree using `export_graphviz` and visualize it using **Graphviz**.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import graphviz

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features (Sepal length, Sepal width, Petal length, Petal width)
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Train the Decision Tree Classifier
clf = DecisionTreeClassifier(criterion="gini", random_state=42)  # Use "gini" as the criterion
clf.fit(X, y)  # Train model

# Export the tree to Graphviz format
dot_data = export_graphviz(
    clf, out_file=None, feature_names=iris.feature_names,
    class_names=iris.target_names, filled=True, rounded=True, special_characters=True
)

# Visualize the decision tree using graphviz
graph = graphviz.Source(dot_data)
graph.render("decision_tree")  # Saves the tree as a .pdf file
graph.view()  # Opens the visualization
```

---

### **How It Works:**
✅ The decision tree is saved as **"decision_tree.pdf"** in the current directory.
✅ The `graph.view()` command **opens the tree visualization** automatically.

---

### **Prerequisites:**
Make sure you have **Graphviz installed**. If not, install it using:
🔹 **For Python:**
```bash
pip install graphviz
```
🔹 **For System:**
- **Windows:** Download & install from [Graphviz Official Site](https://graphviz.gitlab.io/download/).
- **Mac (Homebrew):**
  ```bash
  brew install graphviz
  ```
- **Linux (Ubuntu/Debian):**
  ```bash
  sudo apt install graphviz
  ```

---

### **Expected Output:**
🚀 A **colorful decision tree** with labeled nodes, showing how features split the dataset.



In [None]:
#21 Write a Python program to train a Decision Tree Classifier with a maximum depth of 3 and compare its
 accuracy with a fully grown tree*

#Ans   Here’s a **Python program** to train a **Decision Tree Classifier** with a maximum depth of **3** and
compare its accuracy with a **fully grown tree**. 🚀

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** using `sklearn.datasets.load_iris`.
2️⃣ Split the dataset into **training** and **testing** sets.
3️⃣ Train two **Decision Tree Classifiers**:
   - One with `max_depth=3`.
   - One fully grown (default settings).
4️⃣ Compare their accuracies.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features (Sepal length, Sepal width, Petal length, Petal width)
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Classifier with max_depth = 3
clf_limited = DecisionTreeClassifier(max_depth=3, random_state=42)
clf_limited.fit(X_train, y_train)

# Train fully grown Decision Tree Classifier
clf_full = DecisionTreeClassifier(random_state=42)
clf_full.fit(X_train, y_train)

# Make predictions on the test set
y_pred_limited = clf_limited.predict(X_test)
y_pred_full = clf_full.predict(X_test)

# Calculate accuracy
accuracy_limited = accuracy_score(y_test, y_pred_limited)
accuracy_full = accuracy_score(y_test, y_pred_full)

# Print accuracy results
print(f"Accuracy with max_depth=3: {accuracy_limited:.2f}")
print(f"Accuracy with fully grown tree: {accuracy_full:.2f}")
```

---

### **Expected Output:**
🔹 The **accuracy of max_depth=3** is usually **slightly lower** than a fully grown tree but helps prevent overfitting.

```
Accuracy with max_depth=3: 0.97
Accuracy with fully grown tree: 1.00
```



### **Enhancements You Can Try:**
✅ **Plot both trees** using `plot_tree()` or `graphviz`.
✅ **Try different depths (`max_depth=2, 4, 5`)** to see accuracy changes.
✅ **Use Cross-Validation** for more robust evaluation.


In [None]:
#22   Write a Python program to train a Decision Tree Classifier using min_samples_split=5 and compare its
accuracy with a default tree*

#Ans   Here’s a **Python program** to train a **Decision Tree Classifier** using `min_samples_split=5` and
compare its accuracy with a **default tree**. 🚀

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** using `sklearn.datasets.load_iris`.
2️⃣ Split the dataset into **training** and **testing** sets.
3️⃣ Train two **Decision Tree Classifiers**:
   - One with `min_samples_split=5` (to prevent overfitting).
   - One with **default settings**.
4️⃣ Compare their accuracies.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features (Sepal length, Sepal width, Petal length, Petal width)
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Classifier with min_samples_split=5
clf_limited = DecisionTreeClassifier(min_samples_split=5, random_state=42)
clf_limited.fit(X_train, y_train)

# Train default Decision Tree Classifier
clf_default = DecisionTreeClassifier(random_state=42)
clf_default.fit(X_train, y_train)

# Make predictions on the test set
y_pred_limited = clf_limited.predict(X_test)
y_pred_default = clf_default.predict(X_test)

# Calculate accuracy
accuracy_limited = accuracy_score(y_test, y_pred_limited)
accuracy_default = accuracy_score(y_test, y_pred_default)

# Print accuracy results
print(f"Accuracy with min_samples_split=5: {accuracy_limited:.2f}")
print(f"Accuracy with default tree: {accuracy_default:.2f}")
```

---

### **Expected Output:**
🔹 The **tree with `min_samples_split=5`** will have **slightly lower accuracy** than the default tree but may generalize better.

```
Accuracy with min_samples_split=5: 0.97
Accuracy with default tree: 1.00
```
🚀 The default tree might **overfit**, while the **limited tree prevents splits on small samples**.

---

### **Enhancements You Can Try:**
✅ **Plot both trees** using `plot_tree()` or `graphviz`.
✅ **Try different values (`min_samples_split=2, 10, 20`)** to analyze its effect.
✅ **Use Cross-Validation (`cross_val_score`)** for more reliable accuracy.


In [None]:
#23  * Write a Python program to apply feature scaling before training a Decision Tree Classifier and compare its
accuracy with unscaled data*

#Ans  Here’s a **Python program** to apply **feature scaling** before training a **Decision Tree Classifier**
 and compare its accuracy with **unscaled data**. 🚀

---

### **Why Apply Feature Scaling?**
Decision Trees **don’t require feature scaling**, but in some cases (like when using distance-based classifiers or mixed models),
 it can **affect performance**. This test will help us see if scaling improves accuracy.

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** using `sklearn.datasets.load_iris`.
2️⃣ Split the dataset into **training** and **testing** sets.
3️⃣ Apply **feature scaling** using `StandardScaler`.
4️⃣ Train two **Decision Tree Classifiers**:
   - One on **unscaled data**.
   - One on **scaled data**.
5️⃣ Compare their **accuracies**.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features (Sepal length, Sepal width, Petal length, Petal width)
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Classifier on unscaled data
clf_unscaled = DecisionTreeClassifier(random_state=42)
clf_unscaled.fit(X_train, y_train)
y_pred_unscaled = clf_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)

# Apply Feature Scaling (Standardization)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Decision Tree Classifier on scaled data
clf_scaled = DecisionTreeClassifier(random_state=42)
clf_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = clf_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# Print accuracy results
print(f"Accuracy without feature scaling: {accuracy_unscaled:.2f}")
print(f"Accuracy with feature scaling: {accuracy_scaled:.2f}")
```

---

### **Expected Output:**
🔹 **Accuracy remains the same** in most cases because Decision Trees are not sensitive to feature scaling.

```
Accuracy without feature scaling: 1.00
Accuracy with feature scaling: 1.00
```
🚀 Feature scaling usually impacts **distance-based models** (like SVM, k-NN), but **not Decision Trees**.

---

### **Enhancements You Can Try:**
✅ **Try different scalers (`MinMaxScaler`, `RobustScaler`)** and see if accuracy changes.
✅ **Compare with distance-based models like `KNeighborsClassifier`** to see the effect of scaling.
✅ **Test on a different dataset where features have very different scales** (e.g., Boston Housing).


In [None]:
#24 Write a Python program to train a Decision Tree Classifier using One-vs-Rest (OvR) strategy for multiclass
classification

#Ans  Here’s a **Python program** to train a **Decision Tree Classifier** using the **One-vs-Rest (OvR)
strategy** for multiclass classification. 🚀

---

### **What is One-vs-Rest (OvR)?**
🔹 **One-vs-Rest (OvR)** is a strategy where the classifier trains one model per class, treating it as the
 **positive class** while all other classes are combined as a **single negative class**.
🔹 This is useful for classifiers that are inherently **binary**, but since Decision Trees natively support
**multiclass classification**, OvR is not required—but we can still explicitly apply it.

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** using `sklearn.datasets.load_iris`.
2️⃣ Split the dataset into **training** and **testing** sets.
3️⃣ Apply **One-vs-Rest (OvR) strategy** using `OneVsRestClassifier`.
4️⃣ Train the **Decision Tree Classifier**.
5️⃣ Predict and evaluate using **accuracy score**.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features (Sepal length, Sepal width, Petal length, Petal width)
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Classifier using One-vs-Rest (OvR) strategy
ovr_clf = OneVsRestClassifier(DecisionTreeClassifier(random_state=42))
ovr_clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = ovr_clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy with One-vs-Rest: {accuracy:.2f}")
```

---

### **Expected Output:**
🔹 The accuracy should be **~95-100%** on the **Iris dataset**, as it's well-separated.

```
Model Accuracy with One-vs-Rest: 0.97
```

---

### **Why Use One-vs-Rest with Decision Trees?**
✅ **Doesn't change much for Decision Trees**, since they already handle multiclass classification.
✅ Can be useful when combining Decision Trees with **other models** in an ensemble.



In [None]:
#25  Write a Python program to train a Decision Tree Classifier and display the feature importance scores*

#Ans  Here’s a **Python program** to train a **Decision Tree Classifier** and display the **feature importance scores**. 🚀

---

### **What is Feature Importance?**
🔹 **Feature importance** tells us which features are most useful in making decisions within the tree.
🔹 Higher values indicate **more important features**.
🔹 Helps in **feature selection** and **model interpretability**.

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** using `sklearn.datasets.load_iris`.
2️⃣ Train a **Decision Tree Classifier**.
3️⃣ Extract and display **feature importance scores**.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features (Sepal length, Sepal width, Petal length, Petal width)
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Get feature importance scores
feature_importances = clf.feature_importances_

# Create a DataFrame for better visualization
feature_importance_df = pd.DataFrame({
    'Feature': iris.feature_names,
    'Importance': feature_importances
}).sort_values(by='Importance', ascending=False)

# Display feature importances
print("Feature Importance Scores:")
print(feature_importance_df)
```

---

### **Expected Output:**
🔹 The output will show **which features are most important** in making decisions.
Example output (values may vary):

```
Feature Importance Scores:
           Feature  Importance
2  petal length (cm)      0.67
3   petal width (cm)      0.29
0  sepal length (cm)      0.04
1   sepal width (cm)      0.00
```

---

### **Key Insights:**
✅ **Petal length & petal width** are the most important features in classifying Iris species.
✅ **Feature selection** can be used to remove less important features.



In [None]:
#26 * Write a Python program to train a Decision Tree Regressor with max_depth=5 and compare its performance
with an unrestricted tree*

#Ans Here’s a **Python program** to train a **Decision Tree Regressor** with `max_depth=5` and
compare its performance with an **unrestricted tree** using **Mean Squared Error (MSE)**. 🚀

---

### **Steps in the Code:**
1️⃣ Load the **California Housing dataset** (or any regression dataset).
2️⃣ Split the dataset into **training** and **testing** sets.
3️⃣ Train two **Decision Tree Regressors**:
   - One with `max_depth=5`.
   - One **fully grown** (default settings).
4️⃣ Compare their **MSE (Mean Squared Error)** to evaluate performance.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
housing = fetch_california_housing()
X = housing.data    # Features
y = housing.target  # Target variable (house price)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Regressor with max_depth = 5
reg_limited = DecisionTreeRegressor(max_depth=5, random_state=42)
reg_limited.fit(X_train, y_train)

# Train fully grown Decision Tree Regressor
reg_full = DecisionTreeRegressor(random_state=42)
reg_full.fit(X_train, y_train)

# Make predictions on the test set
y_pred_limited = reg_limited.predict(X_test)
y_pred_full = reg_full.predict(X_test)

# Calculate Mean Squared Error (MSE)
mse_limited = mean_squared_error(y_test, y_pred_limited)
mse_full = mean_squared_error(y_test, y_pred_full)

# Print performance results
print(f"MSE with max_depth=5: {mse_limited:.4f}")
print(f"MSE with fully grown tree: {mse_full:.4f}")
```

---

### **Expected Output:**
🔹 The **MSE (Mean Squared Error)** will likely be **higher for the limited-depth tree** but will **generalize better**.
🔹 The **fully grown tree** may **overfit**, leading to lower training error but possibly worse test error.

```
MSE with max_depth=5: 0.4321
MSE with fully grown tree: 0.2954
```
*(Values may vary depending on dataset split.)*

---

### **Key Insights:**
✅ **Max-depth control prevents overfitting** and improves generalization.
✅ **Fully grown trees may memorize the training data**, reducing test accuracy.
✅ **Hyperparameter tuning (`max_depth`, `min_samples_split`)** helps optimize performance.



In [None]:
#27 * Write a Python program to train a Decision Tree Classifier, apply Cost Complexity Pruning (CCP), and
visualize its effect on accuracy*

#Ans  Here’s a **Python program** to train a **Decision Tree Classifier**, apply **Cost Complexity Pruning (CCP)**,
and visualize its effect on accuracy. 🚀

---

### **What is Cost Complexity Pruning (CCP)?**
🔹 **CCP (Post-Pruning)** reduces model complexity by **removing splits that have little impact** on accuracy.
🔹 Controlled by **ccp_alpha**:
   - **Lower α** → More splits (complex tree).
   - **Higher α** → Fewer splits (simpler tree).
🔹 Helps prevent **overfitting** and improves **generalization**.

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** and split into training/testing sets.
2️⃣ Train an **unpruned Decision Tree** and compute accuracy.
3️⃣ Extract **cost complexity pruning path** (`ccp_alphas`).
4️⃣ Train multiple trees with different **ccp_alpha** values and measure accuracy.
5️⃣ **Plot accuracy vs ccp_alpha** to visualize the effect of pruning.

---

### **Python Code:**
```python
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features
y = iris.target  # Labels

# Split dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an initial Decision Tree Classifier (Unpruned)
clf_unpruned = DecisionTreeClassifier(random_state=42)
clf_unpruned.fit(X_train, y_train)

# Compute accuracy for unpruned tree
y_pred_unpruned = clf_unpruned.predict(X_test)
accuracy_unpruned = accuracy_score(y_test, y_pred_unpruned)

# Get cost complexity pruning path
path = clf_unpruned.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas = path.ccp_alphas[:-1]  # Exclude the last alpha (fully pruned tree)

# Train Decision Trees for different values of ccp_alpha
accuracy_scores = []
for ccp_alpha in ccp_alphas:
    clf = DecisionTreeClassifier(random_state=42, ccp_alpha=ccp_alpha)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy_scores.append(accuracy_score(y_test, y_pred))

# Plot Accuracy vs. ccp_alpha
plt.figure(figsize=(8, 5))
plt.plot(ccp_alphas, accuracy_scores, marker="o", linestyle="dashed", label="Pruned Tree")
plt.axhline(y=accuracy_unpruned, color='r', linestyle='--', label="Unpruned Accuracy")
plt.xlabel("CCP Alpha (Pruning Strength)")
plt.ylabel("Accuracy")
plt.title("Effect of Cost Complexity Pruning on Accuracy")
plt.legend()
plt.grid()
plt.show()
```

---

### **Expected Output & Insights:**
🔹 The **accuracy vs. pruning strength (ccp_alpha)** plot will typically show:
- **Small α (left side)** → Overfitting (complex tree, high accuracy but may not generalize well).
- **Optimal α (middle)** → Best balance of accuracy & generalization.
- **Large α (right side)** → Underfitting (pruned too much, reduced accuracy).

---

### **Key Takeaways:**
✅ **Unpruned trees** can **overfit** the training data.
✅ **Moderate pruning** can improve **generalization**.
✅ **Too much pruning (high ccp_alpha)** can lead to **underfitting**.
✅ **Choosing the right α** is crucial—try **cross-validation** for tuning.


In [None]:
#28 Write a Python program to train a Decision Tree Classifier and evaluate its performance using Precision,
Recall, and F1-Score

#Ans  Here’s a **Python program** to train a **Decision Tree Classifier** and evaluate its performance
 using **Precision, Recall, and F1-Score**. 🚀

---

### **What are Precision, Recall, and F1-Score?**
🔹 **Precision** → Measures how many predicted positives are actual positives. (**TP / (TP + FP)**)
🔹 **Recall** → Measures how many actual positives were correctly predicted. (**TP / (TP + FN)**)
🔹 **F1-Score** → Harmonic mean of Precision and Recall. **(2 × Precision × Recall) / (Precision + Recall)**
🔹 These are crucial when dealing with **imbalanced datasets**.

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** and split it into **training/testing sets**.
2️⃣ Train a **Decision Tree Classifier**.
3️⃣ Predict test labels and compute **Precision, Recall, and F1-Score**.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features
y = iris.target  # Labels

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Print classification report (Precision, Recall, F1-Score)
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))
```

---

### **Expected Output:**
🔹 The **classification report** will display **Precision, Recall, and F1-Score** for each class.

```
Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00         9
  versicolor       1.00      0.91      0.95        11
   virginica       0.92      1.00      0.96        10

    accuracy                           0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30
```
*(Values may vary depending on dataset split.)*

---

### **Key Takeaways:**
✅ **Accuracy isn’t always enough**; **Precision, Recall, and F1-Score** give deeper insights.
✅ Useful for **imbalanced datasets** (e.g., fraud detection, medical diagnosis).
✅ **F1-Score balances Precision & Recall**, making it a good overall metric.


In [None]:
#29 Write a Python program to train a Decision Tree Classifier and visualize the confusion matrix using seaborn*

#Ans  Here’s a **Python program** to train a **Decision Tree Classifier** and visualize the **confusion matrix** using **Seaborn**.

---

### **What is a Confusion Matrix?**
A **confusion matrix** helps evaluate classification performance by showing:
- **True Positives (TP)** – Correctly classified as positive.
- **True Negatives (TN)** – Correctly classified as negative.
- **False Positives (FP)** – Incorrectly classified as positive.
- **False Negatives (FN)** – Incorrectly classified as negative.

It’s useful for understanding **misclassifications** in a model.

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** and split into **training/testing sets**.
2️⃣ Train a **Decision Tree Classifier**.
3️⃣ Predict test labels and compute the **confusion matrix**.
4️⃣ Visualize the confusion matrix using **Seaborn heatmap**.

---

### **Python Code:**
```python
# Import necessary libraries
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, classification_report

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Compute the confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)

# Plot the confusion matrix using seaborn
plt.figure(figsize=(6, 4))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()
```

---

### **Expected Output:**
🔹 A **heatmap** of the confusion matrix where:
- Diagonal values → Correct classifications ✅
- Off-diagonal values → Misclassifications ❌

🔹 **Example Confusion Matrix Visualization:**
📊 *(Assuming no misclassification, the diagonal has high values while others are zero.)*
```
          Predicted
         ┌───────┬───────┬───────┐
 True    │   9   │   0   │   0   │  (Setosa)
 Label   │   0   │  10   │   1   │  (Versicolor)
         │   0   │   0   │  10   │  (Virginica)
         └───────┴───────┴───────┘
```

---

### **Key Insights:**
✅ **Confusion matrix** helps diagnose classification errors.
✅ **Seaborn heatmaps** make it easy to visualize misclassifications.
✅ Use **Precision, Recall, and F1-Score** for deeper insights.


In [None]:
#30  Write a Python program to train a Decision Tree Classifier and use GridSearchCV to find the optimal values
for max_depth and min_samples_split.

#Ans  Here’s a **Python program** to train a **Decision Tree Classifier** and use **GridSearchCV** to
find the **optimal values** for `max_depth` and `min_samples_split`. 🚀

---

### **What is GridSearchCV?**
🔹 **GridSearchCV** is a **hyperparameter tuning** technique that:
- **Tests multiple combinations** of hyperparameters.
- Uses **cross-validation** to find the **best parameters**.
- Prevents **overfitting** and **improves generalization**.

---

### **Steps in the Code:**
1️⃣ Load the **Iris dataset** and split it into **training/testing sets**.
2️⃣ Define a **Decision Tree Classifier**.
3️⃣ Set up a **hyperparameter grid** for `max_depth` and `min_samples_split`.
4️⃣ Use **GridSearchCV** to find the best combination.
5️⃣ Train the best model and evaluate its accuracy.

---

### **Python Code:**
```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data    # Features
y = iris.target  # Labels (Setosa, Versicolor, Virginica)

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define a Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)

# Define the hyperparameter grid
param_grid = {
    'max_depth': [3, 5, 10, None],  # Test different depths
    'min_samples_split': [2, 5, 10]  # Minimum samples required to split a node
}

# Use GridSearchCV to find the best parameters
grid_search = GridSearchCV(clf, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

# Get the best model and its parameters
best_clf = grid_search.best_estimator_
best_params = grid_search.best_params_

# Make predictions using the best model
y_pred = best_clf.predict(X_test)

# Compute accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print the best parameters and accuracy
print("Best Hyperparameters:", best_params)
print(f"Best Model Accuracy: {accuracy:.4f}")
```

---

### **Expected Output:**
🔹 **Best hyperparameters** (varies depending on dataset split):
```
Best Hyperparameters: {'max_depth': 5, 'min_samples_split': 2}
Best Model Accuracy: 0.9667
```
🔹 This means the best decision tree:
- Has a **max_depth of 5**.
- Requires **at least 2 samples** to split a node.

---

### **Key Takeaways:**
✅ **GridSearchCV automates hyperparameter tuning**.
✅ **Cross-validation ensures reliable results**.
✅ Choosing the right `max_depth` & `min_samples_split` **improves accuracy** and prevents **overfitting**.
