Q1.What is a Decision Tree, and how does it work
Ans.A Decision Tree is a supervised machine learning algorithm used for classification and regression tasks. It is a tree-like model where decisions are made by splitting data based on feature values.

How Does a Decision Tree Work?
Start with the Root Node
The entire dataset is considered at the root.
Splitting Based on Features
The algorithm selects the best feature to split the data using criteria like Gini Impurity or Entropy (Information Gain).
Creating Internal Nodes and Branches
Each split creates new internal nodes, leading to branches based on feature values.
Reaching Leaf Nodes
The tree keeps splitting until stopping criteria are met (e.g., max depth, minimum samples per split).
Making Predictions
A new data point is passed through the tree, following the decision rules, until it reaches a leaf node, where the final prediction is made.

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Create Decision Tree classifier
clf = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42)

# Train the model
clf.fit(X, y)

# Visualize the Decision Tree
plt.figure(figsize=(10,6))
plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()


Q2.What are impurity measures in Decision Trees
Ans.Impurity measures determine how well a dataset is split at each node of a Decision Tree. Lower impurity means better classification. The three main impurity measures are:

1. Gini Impurity (Default in Scikit-learn)
Measures the probability of incorrectly classifying a randomly chosen element.

Formula:
𝐺
𝑖
𝑛
𝑖
=
1
−
∑
𝑝
𝑖
2
Gini=1−∑p
i
2
​

where
𝑝
𝑖
p
i
​
  is the probability of class
𝑖
i.
A lower Gini score means a purer node.
Example in Python:
python

from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(criterion='gini')  # Using Gini Impurity
2. Entropy (Information Gain)
Measures the disorder in a dataset. Lower entropy means purer splits.

Formula:
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
=
−
∑
𝑝
𝑖
log
⁡
2
(
𝑝
𝑖
)
Entropy=−∑p
i
​
 log
2
​
 (p
i
​
 )
The algorithm chooses the split with the highest Information Gain, which is:
𝐼
𝑛
𝑓
𝑜
𝑟
𝑚
𝑎
𝑡
𝑖
𝑜
𝑛

𝐺
𝑎
𝑖
𝑛
=
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑝
𝑎
𝑟
𝑒
𝑛
𝑡
−
∑
(
𝑤
𝑖
×
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑐
ℎ
𝑖
𝑙
𝑑
)
Information Gain=Entropy
parent
​
 −∑(w
i
​
 ×Entropy
child
​
 )
Example in Python:
python

clf = DecisionTreeClassifier(criterion='entropy')  # Using Entropy
3. Mean Squared Error (MSE) (For Regression Trees)
Used for regression trees, MSE measures variance within nodes.

Formula:
𝑀
𝑆
𝐸
=
1
𝑁
∑
(
𝑦
𝑖
−
𝑦
ˉ
)
2
MSE=
N
1
​
 ∑(y
i
​
 −
y
ˉ
​
 )
2

The lower the MSE, the better the split.
Example in Python:
python

from sklearn.tree import DecisionTreeRegressor

reg = DecisionTreeRegressor(criterion='mse')  # Using MSE for regression


Q3.What is the mathematical formula for Gini Impurity
Ans.The Gini Impurity measures how often a randomly chosen element from a set would be incorrectly classified if it were randomly labeled according to the distribution of labels in the set.

The formula for Gini Impurity is:

𝐺
𝑖
𝑛
𝑖
=
1
−
∑
𝑖
=
1
𝑐
𝑝
𝑖
2
Gini=1−
i=1
∑
c
​
 p
i
2
​

where:

𝑐
c = Number of classes
𝑝
𝑖
p
i
​
  = Probability of class
𝑖
i in the current node
Example Calculation
Suppose we have a dataset with two classes:

Class A: 60% (0.6 probability)
Class B: 40% (0.4 probability)
The Gini Impurity is calculated as:

𝐺
𝑖
𝑛
𝑖
=
1
−
(
0.6
2
+
0.4
2
)
Gini=1−(0.6
2
 +0.4
2
 )
=
1
−
(
0.36
+
0.16
)
=1−(0.36+0.16)
=
1
−
0.52
=1−0.52
=
0.48
=0.48
Python Implementation
python
Copy
Edit
def gini_impurity(probabilities):
    return 1 - sum(p**2 for p in probabilities)

# Example: Two-class dataset (60% and 40%)
gini = gini_impurity([0.6, 0.4])
print("Gini Impurity:", gini)  # Output: 0.48

Q4.What is the mathematical formula for Entropy
Ans.Entropy measures the disorder (impurity) in a dataset. It is used in Decision Trees to decide the best feature to split on.

The formula for Entropy is:

𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
=
−
∑
𝑖
=
1
𝑐
𝑝
𝑖
log
⁡
2
(
𝑝
𝑖
)
Entropy=−
i=1
∑
c
​
 p
i
​
 log
2
​
 (p
i
​
 )
where:

𝑐
c = Number of classes
𝑝
𝑖
p
i
​
  = Probability of class
𝑖
i in the current node
log
⁡
2
log
2
​
  = Logarithm base 2
Example Calculation
Suppose we have a dataset with two classes:

Class A: 60% (0.6 probability)
Class B: 40% (0.4 probability)
The Entropy is calculated as:

𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
=
−
(
0.6
log
⁡
2
0.6
+
0.4
log
⁡
2
0.4
)
Entropy=−(0.6log
2
​
 0.6+0.4log
2
​
 0.4)
=
−
(
0.6
×
−
0.736
)
−
(
0.4
×
−
1.322
)
=−(0.6×−0.736)−(0.4×−1.322)
=
0.441
+
0.528
=0.441+0.528
=
0.97
=0.97
Python Implementation
python

import numpy as np

def entropy(probabilities):
    return -sum(p * np.log2(p) for p in probabilities if p > 0)

# Example: Two-class dataset (60% and 40%)
entropy_value = entropy([0.6, 0.4])

Q5.What is Information Gain, and how is it used in Decision Trees
Ans.Information Gain (IG) measures how much uncertainty (entropy) is reduced after splitting a dataset based on a feature. It is used in Decision Trees to determine the best feature for splitting at each node.

The formula for Information Gain is:

𝐼
𝐺
=
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑝
𝑎
𝑟
𝑒
𝑛
𝑡
−
∑
𝑖
=
1
𝑘
(
𝑁
𝑖
𝑁
×
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑐
ℎ
𝑖
𝑙
𝑑
𝑖
)
IG=Entropy
parent
​
 −
i=1
∑
k
​
 (
N
N
i
​

​
 ×Entropy
child
i
​

​
 )
where:

𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑝
𝑎
𝑟
𝑒
𝑛
𝑡
Entropy
parent
​
  = Entropy before splitting
𝑘
k = Number of child nodes after the split
𝑁
𝑖
N
i
​
  = Number of samples in child node
𝑖
i
𝑁
N = Total number of samples in the parent node
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑐
ℎ
𝑖
𝑙
𝑑
𝑖
Entropy
child
i
​

​
  = Entropy of each child node
How is Information Gain Used in Decision Trees?
Calculate the Entropy of the Parent Node
Before splitting, find the entropy of the entire dataset.
Split the Dataset on a Feature
Partition the data based on different values of the feature.
Calculate the Weighted Entropy of Child Nodes
Compute the entropy for each child node and weight it by the proportion of samples in that node.
Compute Information Gain
Subtract the weighted sum of child entropies from the parent entropy.
Choose the Feature with the Highest Information Gain
The feature that maximizes Information Gain is selected for splitting.
Example Calculation
Suppose we have a dataset with 10 samples:

Before Splitting:

6 belong to Class A, 4 belong to Class B
Entropy = 0.97
After Splitting on Feature X:

Left Child (4 samples) → 3A, 1B, Entropy = 0.81
Right Child (6 samples) → 3A, 3B, Entropy = 1.0
𝐼
𝐺
=
0.97
−
(
4
10
×
0.81
+
6
10
×
1.0
)
IG=0.97−(
10
4
​
 ×0.81+
10
6
​
 ×1.0)
=
0.97
−
(
0.324
+
0.6
)
=0.97−(0.324+0.6)
=
0.97
−
0.924
=
0.046
=0.97−0.924=0.046
Since Information Gain is low, Feature X is not a good split.

Python Implementation of Information Gain
python

import numpy as np

def entropy(probabilities):
    return -sum(p * np.log2(p) for p in probabilities if p > 0)

def information_gain(parent_probs, child_groups):
    parent_entropy = entropy(parent_probs)
    total_samples = sum(sum(group) for group in child_groups)
    
    weighted_entropy = sum(
        (sum(group) / total_samples) * entropy([x / sum(group) for x in group if sum(group) > 0])
        for group in child_groups
    )

    return parent_entropy - weighted_entropy

# Example: Parent node with (6A, 4B)
parent_probs = [6/10, 4/10]

# Split into two child nodes: (3A, 1B) and (3A, 3B)
child_groups = [[3, 1], [3, 3]]

ig = information_gain(parent_probs, child_groups)
print("Information Gain:", ig)  # Output: 0.046


Q6.What is the difference between Gini Impurity and Entropy
Ans.Both Gini Impurity and Entropy are impurity measures used in Decision Trees to determine the best feature for splitting data. However, they have key differences in calculation, interpretation, and behavior.

1. Formula Comparison
Measure	Formula	Interpretation
Gini Impurity
𝐺
𝑖
𝑛
𝑖
=
1
−
∑
𝑝
𝑖
2
Gini=1−∑p
i
2
​
 	Probability of incorrectly classifying a randomly chosen element.
Entropy
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
=
−
∑
𝑝
𝑖
log
⁡
2
(
𝑝
𝑖
)
Entropy=−∑p
i
​
 log
2
​
 (p
i
​
 )	Measure of disorder in the dataset.
2. Interpretation
Gini Impurity:
Measures how often a randomly chosen element would be misclassified if randomly labeled.
Lower Gini = Purer node.
Entropy:
Measures the uncertainty in a dataset.
Higher entropy means the data is more disordered, and lower entropy means it is more pure.
3. Range of Values
Measure	Minimum Value (Pure Split)	Maximum Value (Highest Impurity)
Gini Impurity	0 (Pure class)	0.5 (Two equal classes)
Entropy	0 (Pure class)	1 (Two equal classes)
4. Computational Efficiency
Gini is faster than Entropy because it does not require computing logarithms.
Entropy is more mathematically rigorous, but it may lead to deeper trees compared to Gini.
5. Example Calculation
Example: Two-Class Dataset (60% A, 40% B)
Gini Impurity Calculation:
𝐺
𝑖
𝑛
𝑖
=
1
−
(
0.6
2
+
0.4
2
)
=
1
−
(
0.36
+
0.16
)
=
0.48
Gini=1−(0.6
2
 +0.4
2
 )=1−(0.36+0.16)=0.48
Entropy Calculation:
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
=
−
(
0.6
log
⁡
2
0.6
+
0.4
log
⁡
2
0.4
)
=
0.97
Entropy=−(0.6log
2
​
 0.6+0.4log
2
​
 0.4)=0.97
6. Python Implementation
python

import numpy as np

def gini_impurity(probabilities):
    return 1 - sum(p**2 for p in probabilities)

def entropy(probabilities):
    return -sum(p * np.log2(p) for p in probabilities if p > 0)

# Example: Two-class dataset (60% and 40%)
probabilities = [0.6, 0.4]

gini_value = gini_impurity(probabilities)
entropy_value = entropy(probabilities)

print("Gini Impurity:", gini_value)  # Output: 0.48
print("Entropy:", entropy_value)  # Output: 0.97
7. When to Use Gini or Entropy?
Use Case	Recommended Measure
Faster computation	Gini Impurity
More mathematically rigorous	Entropy
Similar classification performance	Both work well
In scikit-learn, DecisionTreeClassifier defaults to Gini Impurity because it is computationally faster.

Q7.What is the mathematical explanation behind Decision Trees
Ans.A Decision Tree is a hierarchical model used for classification and regression tasks. It works by recursively splitting the dataset based on impurity measures such as Gini Impurity or Entropy (Information Gain).

1. Splitting Criterion
At each node, the algorithm chooses the best feature to split on by minimizing impurity.

1.1 Entropy & Information Gain
Entropy measures disorder (impurity) in a dataset:

𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
=
−
∑
𝑖
=
1
𝑐
𝑝
𝑖
log
⁡
2
(
𝑝
𝑖
)
Entropy=−
i=1
∑
c
​
 p
i
​
 log
2
​
 (p
i
​
 )
where
𝑝
𝑖
p
i
​
  is the probability of class
𝑖
i.

Information Gain (IG):

𝐼
𝐺
=
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑝
𝑎
𝑟
𝑒
𝑛
𝑡
−
∑
𝑖
=
1
𝑘
(
𝑁
𝑖
𝑁
×
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑐
ℎ
𝑖
𝑙
𝑑
𝑖
)
IG=Entropy
parent
​
 −
i=1
∑
k
​
 (
N
N
i
​

​
 ×Entropy
child
i
​

​
 )
A feature with higher IG is chosen for splitting.

1.2 Gini Impurity
Gini Impurity measures the probability of incorrect classification:

𝐺
𝑖
𝑛
𝑖
=
1
−
∑
𝑖
=
1
𝑐
𝑝
𝑖
2
Gini=1−
i=1
∑
c
​
 p
i
2
​

A lower Gini value indicates a better split.

2. Recursive Partitioning (Splitting Process)
The dataset is recursively divided into subsets:

Calculate impurity (Entropy or Gini) at the current node.
Choose the feature that minimizes impurity (maximizes Information Gain).
Split the data based on the chosen feature.
Repeat until a stopping condition is met (e.g., max depth, min samples per node).
3. Stopping Criteria (Preventing Overfitting)
Max Depth (
𝑑
d): Stops splitting after a fixed depth.
Min Samples per Leaf (
𝑛
n): A node must have at least
𝑛
n samples to split.
Min Information Gain: Stop if IG is too low.
Pruning: Removes unnecessary branches.
4. Decision Rule (Making Predictions)
In classification, the final leaf node assigns a majority class label.
In regression, the leaf node outputs the mean value of the target variable.
𝑦
𝑙
𝑒
𝑎
𝑓
=
1
𝑁
∑
𝑖
=
1
𝑁
𝑦
𝑖
y
leaf
​
 =
N
1
​
  
i=1
∑
N
​
 y
i
​

5. Python Implementation of Decision Trees
python

from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Create Decision Tree classifier
clf = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42)
clf.fit(X, y)

# Visualize the Decision Tree
plt.figure(figsize=(10, 6))
plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()
6. Summary Table
Concept	Formula	Purpose
Entropy
−
∑
𝑝
𝑖
log
⁡
2
𝑝
𝑖
−∑p
i
​
 log
2
​
 p
i
​
 	Measures disorder in a dataset
Gini Impurity
1
−
∑
𝑝
𝑖
2
1−∑p
i
2
​
 	Measures probability of misclassification
Information Gain
𝐼
𝐺
=
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑝
𝑎
𝑟
𝑒
𝑛
𝑡
−
𝑊
𝑒
𝑖
𝑔
ℎ
𝑡
𝑒
𝑑

𝑆
𝑢
𝑚

𝑜
𝑓

𝐶
ℎ
𝑖
𝑙
𝑑

𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑖
𝑒
𝑠
IG=Entropy
parent
​
 −Weighted Sum of Child Entropies	Determines the best feature for splitting
Stopping Condition	Max depth, min samples per leaf, pruning	Prevents overfitting

Q8.What is Pre-Pruning in Decision Trees
Ans.Pre-pruning (Early Stopping) is a technique used to stop the growth of a Decision Tree early to prevent overfitting. It imposes constraints before the tree fully grows.

Why Pre-Pruning?
If a Decision Tree grows too deep:
 It fits the training data perfectly.
 But it may overfit and perform poorly on unseen data.

Pre-pruning limits tree complexity by stopping splits based on predefined conditions.

Pre-Pruning Techniques in Decision Trees
Pre-Pruning Method	Description	Effect
Max Depth (
𝑑
d)	Limits the tree's depth	Prevents excessive branching
Min Samples Split (
𝑛
n)	Minimum samples required to split a node	Avoids splitting small datasets
Min Samples Leaf (
𝑛
n)	Minimum samples required in a leaf node	Prevents creation of small, biased leaves
Min Impurity Decrease	Stops splitting if impurity reduction is too small	Prevents unnecessary splits
Mathematical Explanation
A split occurs only if:

𝐼
𝐺
=
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑝
𝑎
𝑟
𝑒
𝑛
𝑡
−
∑
(
𝑁
𝑖
𝑁
×
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
𝑐
ℎ
𝑖
𝑙
𝑑
𝑖
)
>
Threshold
IG=Entropy
parent
​
 −∑(
N
N
i
​

​
 ×Entropy
child
i
​

​
 )>Threshold
Where:

𝐼
𝐺
IG = Information Gain
𝑁
𝑖
N
i
​
  = Samples in child node
𝑁
N = Total samples in parent node
Threshold is set by min_impurity_decrease.
If IG is too low, the tree stops growing at that point.

Python Implementation of Pre-Pruning
python

from sklearn.tree import DecisionTreeClassifier

# Create Decision Tree with Pre-Pruning
clf = DecisionTreeClassifier(
    criterion='gini',   # Use Gini Impurity
    max_depth=3,        # Limit tree depth
    min_samples_split=5, # Minimum 5 samples to split
    min_samples_leaf=2,  # Minimum 2 samples per leaf
    random_state=42
)

# Fit the model
clf.fit(X, y)
Advantages of Pre-Pruning
 Reduces Overfitting – Prevents learning noise from training data.
 Improves Generalization – Works better on unseen data.
 Faster Training – Limits tree complexity.

Disadvantages of Pre-Pruning
 May stop splitting too early, missing important patterns.
Needs careful tuning of parameters.

Pre-Pruning vs Post-Pruning
Feature	Pre-Pruning	Post-Pruning
When applied?	Before growing the tree	After full tree growth
Purpose	Prevents overfitting early	Removes unnecessary branches
Risk	May underfit if too restrictive	May still overfit before pruning

Q9.What is Post-Pruning in Decision Trees
Ans.Post-pruning (Pruning after Training) is a technique used to reduce overfitting by removing unnecessary branches from a fully grown Decision Tree after it has been trained.

Why Post-Pruning?
A fully grown Decision Tree can:
 Fit the training data perfectly
 Capture noise and overfit on unseen data

Post-pruning removes weak branches, making the model more generalized and improving accuracy on test data.

Post-Pruning Techniques
Post-Pruning Method	Description	Effect
Cost Complexity Pruning (CCP)	Removes nodes that don’t significantly reduce error	Simplifies the tree
Reduced Error Pruning	Prunes a node if removing it does not increase test error	Prevents overfitting
Minimal Error Pruning	Uses cross-validation to decide pruning	Finds optimal tree depth
Mathematical Explanation
1. Cost Complexity Pruning (CCP)
The idea is to minimize:

𝑅
(
𝑇
)
+
𝛼
×
∣
𝑇
∣
R(T)+α×∣T∣
Where:

𝑅
(
𝑇
)
R(T) = Classification error of the tree
𝑇
T
∣
𝑇
∣
∣T∣ = Number of terminal nodes (leaves)
𝛼
α = Pruning parameter (controls tree complexity)
A node is removed if pruning reduces the cost function.

Python Implementation of Post-Pruning
Step 1: Train a Full Decision Tree
python

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Sample dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train full tree
clf = DecisionTreeClassifier(criterion="gini", ccp_alpha=0.0)  # No pruning initially
clf.fit(X_train, y_train)
Step 2: Find the Best Pruning Parameter (
𝛼
α)
python

path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas = path.ccp_alphas  # Get a list of possible alpha values

# Try different pruning levels and find the best one
for alpha in ccp_alphas:
    clf_pruned = DecisionTreeClassifier(ccp_alpha=alpha)
    clf_pruned.fit(X_train, y_train)
    print(f"Alpha: {alpha}, Accuracy: {clf_pruned.score(X_test, y_test)}")
Step 3: Train the Pruned Tree
python

best_alpha = 0.01  # Choose the best alpha from the above step
pruned_clf = DecisionTreeClassifier(ccp_alpha=best_alpha)
pruned_clf.fit(X_train, y_train)

print("Pruned Tree Accuracy:", pruned_clf.score(X_test, y_test))
Advantages of Post-Pruning
Better Generalization – Reduces overfitting by removing unnecessary splits.
 More Reliable – Uses real test data to decide pruning.
Improves Performance – Creates smaller, interpretable trees.

Disadvantages of Post-Pruning
 Computationally Expensive – Requires extra validation/testing.
 Choosing
𝛼
α is tricky – Needs cross-validation for best results.

Q10.What is the difference between Pre-Pruning and Post-Pruning
Ans.Both Pre-Pruning and Post-Pruning are techniques to prevent overfitting in Decision Trees by controlling tree growth. However, they differ in when and how pruning is applied.

1. Key Differences
Feature	Pre-Pruning (Early Stopping)	Post-Pruning (Pruning After Training)
When is it applied?	During tree construction	After the tree is fully grown
How it works?	Stops splitting if conditions are met	Grows full tree, then removes unnecessary nodes
Stopping Criteria	Max depth, min samples per leaf, min impurity decrease	Cost complexity pruning (CCP), validation set pruning
Risk	Underfitting if tree stops growing too early	Better generalization since the tree is fully grown before pruning
Computational Cost	Faster, requires fewer calculations	More expensive, requires cross-validation
Accuracy Impact	Can reduce accuracy if too strict	Typically leads to better accuracy
2. When to Use Which?
Scenario	Recommended Pruning Method
Large dataset, need faster training	Pre-Pruning (limits depth)
Overfitting on training data	Post-Pruning (removes unnecessary nodes)
Uncertain about tree depth	Post-Pruning (allows full growth, then prunes)
3. Python Implementation Example
Pre-Pruning (Limit Tree Growth)
python

from sklearn.tree import DecisionTreeClassifier

# Apply Pre-Pruning with max depth and min samples per split
clf_pre = DecisionTreeClassifier(max_depth=3, min_samples_split=5, min_samples_leaf=2)
clf_pre.fit(X_train, y_train)
Post-Pruning (Cost Complexity Pruning)
python

# Train a full tree first
clf_full = DecisionTreeClassifier(random_state=42)
clf_full.fit(X_train, y_train)

# Get pruning path
path = clf_full.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas = path.ccp_alphas

# Train multiple pruned trees and select the best
best_alpha = 0.01  # Found via cross-validation
clf_post = DecisionTreeClassifier(ccp_alpha=best_alpha)
clf_post.fit(X_train, y_train)

Q11.What is a Decision Tree Regressor
Ans.A Decision Tree Regressor is a type of Decision Tree model used for regression tasks, meaning it predicts continuous numerical values rather than discrete classes. Instead of classifying data, it splits the dataset into regions and assigns an average value for each region.

1. How Does a Decision Tree Regressor Work?
Splitting the Data

The algorithm recursively splits the dataset into subsets using a chosen feature.
The split is chosen to minimize variance (impurity) in the target variable.
Stopping Criteria

The recursion stops when a stopping criterion is met (e.g., max depth, min samples per leaf).
Prediction

Each leaf node stores the mean of the target values in that region.
2. Impurity Measure: Mean Squared Error (MSE)
Unlike classification trees, which use Gini Impurity or Entropy, regression trees use Mean Squared Error (MSE) to determine the best split:

𝑀
𝑆
𝐸
=
1
𝑁
∑
𝑖
=
1
𝑁
(
𝑦
𝑖
−
𝑦
^
)
2
MSE=
N
1
​
  
i=1
∑
N
​
 (y
i
​
 −
y
^
​
 )
2

where:

𝑦
𝑖
y
i
​
  is the actual value
𝑦
^
y
^
​
  is the predicted mean value of the node
𝑁
N is the number of samples in the node
The tree selects the feature and split that minimizes the total MSE.

3. Python Implementation
Example: Predicting House Prices
python


from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Sample dataset (House size vs Price)
X = np.array([[600], [800], [1000], [1200], [1500], [1800], [2000]])
y = np.array([150, 200, 250, 300, 350, 400, 450])  # Prices in $1000s

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Regressor
regressor = DecisionTreeRegressor(max_depth=3)
regressor.fit(X_train, y_train)

# Make predictions
y_pred = regressor.predict(X_test)

# Calculate MSE
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
4. Visualization of Decision Tree Regressor
Decision Trees create stepwise constant approximations of the target function. Below is an example of how a Decision Tree regressor might approximate a function:

python


import matplotlib.pyplot as plt

# Plot Decision Tree Predictions
X_range = np.linspace(500, 2100, 100).reshape(-1, 1)
y_pred_range = regressor.predict(X_range)

plt.scatter(X, y, color="red", label="Actual Data")
plt.plot(X_range, y_pred_range, color="blue", label="Decision Tree Prediction")
plt.xlabel("House Size (sq ft)")
plt.ylabel("Price ($1000s)")
plt.legend()
plt.show()
5. Advantages of Decision Tree Regression
 Non-Linear Relationships – Can model non-linear dependencies between features and target variables.
 No Assumptions Needed – Unlike linear regression, it doesn't assume a specific relationship.
 Handles Categorical & Numerical Data – Works with both feature types.

6. Disadvantages
 Overfitting – Trees can grow too deep, capturing noise rather than trends.
 Not Smooth – Predictions are piecewise constant, which may not capture trends well.
 Sensitive to Small Changes – Small variations in data can lead to different trees.

7. Pruning for Decision Tree Regressors
To prevent overfitting, we can use pre-pruning or post-pruning:

python

regressor = DecisionTreeRegressor(max_depth=3, min_samples_leaf=5)
regressor.fit(X_train, y_train)

Q12.What are the advantages and disadvantages of Decision Trees
Ans.Decision Trees are widely used for classification and regression tasks due to their simplicity and interpretability. However, they also have limitations.

1. Advantages of Decision Trees
 1. Easy to Understand & Interpret
Decision Trees mimic human decision-making, making them easy to explain.
You can visualize the tree structure.
 2. Handles Both Numerical & Categorical Data
Can be used for classification (categorical outputs) and regression (continuous outputs).
Unlike linear regression, no need to transform categorical variables.
 3. No Need for Feature Scaling
Unlike algorithms like SVM or k-NN, no need for normalization or standardization.
 4. Handles Missing Values
Can split data even with missing values, unlike some models that require imputation.
 5. Works Well with Non-Linear Relationships
Captures non-linear patterns better than linear models.
 6. Automatic Feature Selection
Chooses the most important features automatically during splitting.
 7. Fast Training & Prediction
Computationally efficient for small datasets.
2. Disadvantages of Decision Trees
 1. Overfitting (High Variance)
Deep trees can memorize training data, leading to poor generalization on unseen data.
Solution: Use pruning (pre-pruning or post-pruning) or limit tree depth.
 2. Sensitive to Small Changes in Data (Unstable Model)
Small variations in data can lead to completely different trees.
Solution: Use ensemble methods like Random Forest or Gradient Boosting.
 3. Greedy Algorithm (Locally Optimal, Not Globally Optimal)
The algorithm chooses the best split at each step but may not find the best overall tree.
Solution: Use Random Forest to reduce dependency on single splits.
 4. Biased with Imbalanced Data
If some classes are more frequent than others, the tree may favor those classes.
Solution: Use class weighting or resampling techniques.
5. Large Trees Become Hard to Interpret
A very deep tree can have hundreds of nodes, making it difficult to understand.
Solution: Use feature selection or pruning.

Q13.How does a Decision Tree handle missing values
Ans.Decision Trees can handle missing values in various ways during training and prediction. Unlike some models (e.g., linear regression), Decision Trees can split data even if some feature values are missing.

1. Handling Missing Values During Training
 1. Ignoring Missing Values in Splitting
Scikit-learn’s DecisionTreeClassifier does NOT handle missing values directly, so missing values need to be imputed first.
If using a Decision Tree from other libraries (e.g., XGBoost, LightGBM), the tree can find the best split even if some values are missing.
Solution:

Fill missing values with the most frequent category (categorical data) or mean/median (numerical data).
Use the SimpleImputer class from scikit-learn.
Example: Imputing Missing Values Before Training

python


from sklearn.tree import DecisionTreeClassifier
from sklearn.impute import SimpleImputer
import numpy as np

# Sample dataset with missing values (NaN)
X = np.array([[1, 2], [3, np.nan], [5, 6], [np.nan, 8]])
y = np.array([0, 1, 0, 1])

# Fill missing values using mean imputation
imputer = SimpleImputer(strategy="mean")
X_imputed = imputer.fit_transform(X)

# Train Decision Tree
clf = DecisionTreeClassifier()
clf.fit(X_imputed, y)
 2. Surrogate Splits (Used in Some Libraries like C4.5, CART)
If a value is missing for the primary splitting feature, the tree can use an alternative (surrogate) feature that correlates with the primary feature.
This avoids discarding the sample entirely.
 Scikit-learn does NOT use surrogate splits, but other libraries like C4.5 (used in Weka) and CART implement this.

2. Handling Missing Values During Prediction
 1. Assigning the Most Frequent Class (Classification) or Mean Value (Regression)
If a missing value appears in a test sample, the tree can:

Assign the most frequent class (for classification).
Assign the average target value (for regression).
 2. Dropping Features with High Missing Rates
If a feature has too many missing values, it might not be useful for decision-making and can be excluded from training.
3. Practical Considerations
Method	Pros	Cons
Mean/Median Imputation	Simple, widely used	Can introduce bias
Most Frequent Value	Works well for categorical data	Can distort class distribution
Surrogate Splits	Keeps missing data without discarding samples	Not supported in scikit-learn
Dropping Missing Features	Reduces complexity	Risk of losing useful information
4. Alternative: Using Random Forest or XGBoost
Random Forest can handle missing values better by averaging multiple trees.
XGBoost automatically learns the best direction for missing values instead of filling them manually.

Q14.How does a Decision Tree handle categorical features
Ans.Decision Trees can handle categorical features (like "Color = Red, Blue, Green") differently depending on the implementation.

1. Methods to Handle Categorical Features
 1. Label Encoding (Ordinal Encoding)
Assigns numerical labels to categories (e.g., "Red" → 0, "Blue" → 1, "Green" → 2).
Works well if the categories have an inherent order (e.g., "Low" < "Medium" < "High").
Risk: Can introduce incorrect numerical relationships if no natural order exists.
Example in Python:

python

from sklearn.preprocessing import LabelEncoder

data = ["Red", "Blue", "Green", "Blue", "Red"]
encoder = LabelEncoder()
encoded_data = encoder.fit_transform(data)
print(encoded_data)  # Output: [2, 0, 1, 0, 2]
 2. One-Hot Encoding (Dummy Variables)
Creates binary columns for each category (e.g., "Red" → [1,0,0], "Blue" → [0,1,0], "Green" → [0,0,1]).
Prevents false ordinal relationships introduced by Label Encoding.
Risk: Increases the number of features (high dimensionality) if many categories exist.
Example in Python:

python

from sklearn.preprocessing import OneHotEncoder
import numpy as np

data = np.array([["Red"], ["Blue"], ["Green"], ["Blue"], ["Red"]])
encoder = OneHotEncoder(sparse_output=False)
encoded_data = encoder.fit_transform(data)
print(encoded_data)
Output:

ini

[[0. 0. 1.]  # Red
 [1. 0. 0.]  # Blue
 [0. 1. 0.]  # Green
 [1. 0. 0.]  # Blue
 [0. 0. 1.]] # Red
 3. Decision Trees with Native Categorical Support
Some implementations (e.g., LightGBM, CatBoost) directly handle categorical variables without encoding.
Avoids creating unnecessary splits.
Scikit-learn does NOT support categorical features natively (must use encoding first).
Example (LightGBM):

python

import lightgbm as lgb
import pandas as pd

data = pd.DataFrame({"Color": ["Red", "Blue", "Green", "Blue", "Red"], "Size": [10, 15, 20, 15, 10]})
data["Color"] = data["Color"].astype("category")  # Specify categorical column

model = lgb.LGBMClassifier()
model.fit(data[["Color", "Size"]], [0, 1, 0, 1, 0])  # Train model
2. Which Encoding Method Should You Use?
Method	When to Use	Pros	Cons
Label Encoding	When categories have an order	Simple, efficient	Implies false ranking if no order exists
One-Hot Encoding	When categories have no order	Avoids false ranking	Can create too many features
Native Categorical Support (LightGBM, CatBoost)	Large datasets with categorical features	Automatically handles categories	Not supported in scikit-learn
3. Example: Using Categorical Features in a Decision Tree (Scikit-learn)
Since scikit-learn does not support categorical features directly, you must encode them first:

python

from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import OneHotEncoder
import numpy as np

# Sample dataset
X = np.array([["Red"], ["Blue"], ["Green"], ["Blue"], ["Red"]])
y = np.array([0, 1, 0, 1, 0])

# One-Hot Encoding
encoder = OneHotEncoder(sparse_output=False)
X_encoded = encoder.fit_transform(X)

# Train Decision Tree
clf = DecisionTreeClassifier()
clf.fit(X_encoded, y)


Q15.What are some real-world applications of Decision Trees?
Ans.Decision Trees are widely used in various industries for classification and regression problems. Their ability to handle both numerical and categorical data makes them highly versatile.

1. Healthcare
 Disease Diagnosis
Used to classify whether a patient has a disease based on symptoms.
Example: Predicting diabetes or heart disease based on patient records.
Example:

python

from sklearn.tree import DecisionTreeClassifier

# Features: Age, Blood Pressure, Sugar Level
X = [[45, 120, 85], [50, 140, 95], [60, 130, 110], [30, 110, 75]]
y = [1, 1, 1, 0]  # 1 = Diabetic, 0 = Non-Diabetic

clf = DecisionTreeClassifier()
clf.fit(X, y)
2. Finance & Banking
 Credit Scoring & Loan Approval
Determines whether a person is eligible for a loan based on factors like income, credit score, and loan history.
Example: Banks use decision trees to classify loan defaulters vs. non-defaulters.
 Fraud Detection
Detects unusual transactions to flag potential fraud.
Example: Credit card fraud detection based on spending patterns.
3. Marketing & Sales
 Customer Segmentation
Classifies customers based on their purchasing behavior.
Example: Retail companies identify high-value customers for targeted ads.
 Recommendation Systems
Predicts which products a user is likely to buy based on past purchases.
Example:

Amazon suggests products using decision tree-based models like XGBoost.
4. Manufacturing & Quality Control
 Defect Detection in Products
Used in factories to classify defective vs. non-defective products.
 Predictive Maintenance
Predicts when a machine is likely to fail based on sensor data.
5. Human Resources (HR)
 Employee Attrition Prediction
Predicts whether an employee is likely to leave a company based on factors like salary, work hours, and promotions.
6. Agriculture
 Crop Disease Prediction
Predicts crop diseases based on environmental conditions.
 Yield Prediction
Helps farmers estimate crop yield based on soil quality and weather data.
7. Autonomous Vehicles
 Traffic Sign Recognition
Used in self-driving cars to classify traffic signs.
8. Education
Student Performance Prediction
Predicts whether a student will pass/fail based on attendance and study time.


**Practical**

Q16.Write a Python program to train a Decision Tree Classifier on the Iris dataset and print the model accuracy*
Ans.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Decision Tree Classifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Predict on test data
y_pred = clf.predict(X_test)

# Calculate and print accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")


Q17.Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the
feature importances*
Ans.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import numpy as np

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Decision Tree Classifier with Gini Impurity
clf = DecisionTreeClassifier(criterion="gini", random_state=42)
clf.fit(X_train, y_train)

# Print Feature Importances
print("Feature Importances:")
for feature, importance in zip(iris.feature_names, clf.feature_impor


Q18.Write a Python program to train a Decision Tree Classifier using Entropy as the splitting criterion and print the
model accuracy*
Ans.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Decision Tree Classifier with Entropy
clf = DecisionTreeClassifier(criterion="entropy", random_state=42)
clf.fit(X_train, y_train)

# Predict on test data
y_pred = clf.predict(X_test)

# Calculate and print accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")


Q19.Write a Python program to train a Decision Tree Regressor on a housing dataset and evaluate using Mean
Squared Error (MSE)*
Ans.

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target  # Features and target (house prices)

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Decision Tree Regressor
regressor = DecisionTreeRegressor(random_state=42)
regressor.fit(X_train, y_train)

# Predict on test data
y_pred = regressor.predict(X_test)

# Calculate and print Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.2f}")


Q20.Write a Python program to train a Decision Tree Classifier and visualize the tree using graphviz
Ans

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import graphviz

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Train Decision Tree Classifier
clf = DecisionTreeClassifier(criterion="gini", random_state=42)
clf.fit(X, y)

# Generate Graphviz visualization
dot_data = export_graphviz(
    clf,
    out_file=None,
    feature_names=iris.feature_names,
    class_names=iris.target_names,
    filled=True, rounded=True, special_characters=True
)

# Render and display the tree
graph = graphviz.Source(dot_data)
graph.view()  # Opens the tree visualization


Q21.Write a Python program to train a Decision Tree Classifier with a maximum depth of 3 and compare its
accuracy with a fully grown tree
Ans.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree with max depth = 3
clf_limited = DecisionTreeClassifier(max_depth=3, random_state=42)
clf_limited.fit(X_train, y_train)

# Train Fully Grown Decision Tree (no depth limit)
clf_full = DecisionTreeClassifier(random_state=42)  # No max_depth
clf_full.fit(X_train, y_train)

# Predict on test data
y_pred_limited = clf_limited.predict(X_test)
y_pred_full = clf_full.predict(X_test)

# Calculate accuracy
accuracy_limited = accuracy_score(y_test, y_pred_limited)
accuracy_full = accuracy_score(y_test, y_pred_full)

# Print accuracy comparison
print(f"Accuracy with max_depth=3: {accuracy_limited:.2f}")
print(f"Accuracy with fully grown tree: {accuracy_full:.2f}")


Q22.Write a Python program to train a Decision Tree Classifier using min_samples_split=5 and compare its
accuracy with a default tree*
Ans.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree with min_samples_split=5
clf_limited = DecisionTreeClassifier(min_samples_split=5, random_state=42)
clf_limited.fit(X_train, y_train)

# Train Default Decision Tree
clf_default = DecisionTreeClassifier(random_state=42)  # Default parameters
clf_default.fit(X_train, y_train)

# Predict on test data
y_pred_limited = clf_limited.predict(X_test)
y_pred_default = clf_default.predict(X_test)

# Calculate accuracy
accuracy_limited = accuracy_score(y_test, y_pred_limited)
accuracy_default = accuracy_score(y_test, y_pred_default)

# Print accuracy comparison
print(f"Accuracy with min_samples_split=5: {accuracy_limited:.2f}")
print(f"Accuracy with default tree: {accuracy_default:.2f}")


Q23.Write a Python program to apply feature scaling before training a Decision Tree Classifier and compare its
accuracy with unscaled data*
Ans.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree without feature scaling
clf_unscaled = DecisionTreeClassifier(random_state=42)
clf_unscaled.fit(X_train, y_train)
y_pred_unscaled = clf_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)

# Apply feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Decision Tree with feature scaling
clf_scaled = DecisionTreeClassifier(random_state=42)
clf_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = clf_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# Print accuracy comparison
print(f"Accuracy without feature scaling: {accuracy_unscaled:.2f}")
print(f"Accuracy with feature scaling: {accuracy_scaled:.2f}")


Q24.Write a Python program to train a Decision Tree Classifier using One-vs-Rest (OvR) strategy for multiclass
classification*
Ans.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Use One-vs-Rest (OvR) with Decision Tree Classifier
clf_ovr = OneVsRestClassifier(DecisionTreeClassifier(random_state=42))
clf_ovr.fit(X_train, y_train)

# Predict on test data
y_pred = clf_ovr.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print accuracy
print(f"Model Accuracy using One-vs-Rest (OvR): {accuracy:.2f}")


Q25.Write a Python program to train a Decision Tree Classifier and display the feature importance scores*
Ans.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Train a Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X, y)

# Get feature importance scores
feature_importance = clf.feature_importances_

# Display feature importance
for feature, importance in zip(iris.feature_names, feature_importance):
    print(f"{feature}: {importance:.4f}")

# Plot feature importance
plt.figure(figsize=(8, 5))
plt.bar(iris.feature_names, feature_importance, color='skyblue')
plt.xlabel("Features")
plt.ylabel("Importance Score")
plt.title("Feature Importance in Decision Tree")
plt.show()


Q26.Write a Python program to train a Decision Tree Regressor with max_depth=5 and compare its performance
with an unrestricted tree*
Ans.

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Regressor with max_depth=5
reg_limited = DecisionTreeRegressor(max_depth=5, random_state=42)
reg_limited.fit(X_train, y_train)

# Train Fully Grown Decision Tree Regressor (no depth limit)
reg_full = DecisionTreeRegressor(random_state=42)  # No max_depth
reg_full.fit(X_train, y_train)

# Predict on test data
y_pred_limited = reg_limited.predict(X_test)
y_pred_full = reg_full.predict(X_test)

# Calculate Mean Squared Error (MSE)
mse_limited = mean_squared_error(y_test, y_pred_limited)
mse_full = mean_squared_error(y_test, y_pred_full)

# Print MSE comparison
print(f"MSE with max_depth=5: {mse_limited:.4f}")
print(f"MSE with fully grown tree: {mse_full:.4f}")


Q27.Write a Python program to train a Decision Tree Classifier, apply Cost Complexity Pruning (CCP), and
visualize its effect on accuracy*
ANs.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Decision Tree without pruning
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Get Cost Complexity Pruning path (alpha values)
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas = path.ccp_alphas  # List of alpha values

# Store accuracy for different alpha values
train_scores = []
test_scores = []

# Train Decision Tree Classifier for each alpha value
for alpha in ccp_alphas:
    clf_pruned = DecisionTreeClassifier(random_state=42, ccp_alpha=alpha)
    clf_pruned.fit(X_train, y_train)

    # Predict on train & test data
    y_train_pred = clf_pruned.predict(X_train)
    y_test_pred = clf_pruned.predict(X_test)

    # Calculate accuracy
    train_scores.append(accuracy_score(y_train, y_train_pred))
    test_scores.append(accuracy_score(y_test, y_test_pred))

# Plot accuracy vs. ccp_alpha
plt.figure(figsize=(8, 5))
plt.plot(ccp_alphas, train_scores, marker='o', label="Training Accuracy", color='blue')
plt.plot(ccp_alphas, test_scores, marker='o', label="Testing Accuracy", color='red')
plt.xlabel("Cost Complexity Pruning (ccp_alpha)")
plt.ylabel("Accuracy")
plt.title("Effect of Cost Complexity Pruning on Decision Tree Accuracy")
plt.legend()
plt.xscale("log")  # Log scale for better visualization
plt.show()


Q28.Write a Python program to train a Decision Tree Classifier and evaluate its performance using Precision,
Recall, and F1-Score*
Ans.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Predict on test data
y_pred = clf.predict(X_test)

# Compute Precision, Recall, and F1-Score
report = classification_report(y_test, y_pred, target_names=iris.target_names)

# Print the evaluation metrics
print("Classification Report:\n")
print(report)


Q29.Write a Python program to train a Decision Tree Classifier and visualize the confusion matrix using seaborn*
Ans.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, classification_report

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Predict on test data
y_pred = clf.predict(X_test)

# Compute Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

# Plot Confusion Matrix using Seaborn
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix - Decision Tree Classifier")
plt.show()

# Print Classification Report
print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=iris.target_names))


Q30.Write a Python program to train a Decision Tree Classifier and use GridSearchCV to find the optimal values
for max_depth and min_samples_split.
Ans.

In [None]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target  # Features and labels

# Split data into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define parameter grid for max_depth and min_samples_split
param_grid = {
    'max_depth': [3, 5, 10, None],  # Different tree depths
    'min_samples_split': [2, 5, 10]  # Minimum samples required to split a node
}

# Perform Grid Search with cross-validation
grid_search = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Get the best model parameters
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

# Predict on test data
y_pred = best_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Display results
print(f"Best Parameters: {best_params}")
print(f"Test Accuracy: {accuracy:.4f}")
