Here are the detailed answers to all the theory questions from your assignment:

---

### **1. What is a Decision Tree, and how does it work?**
A **Decision Tree** is a supervised learning algorithm used for classification and regression tasks. It is a tree-like model where:
- Each internal node represents a decision based on a feature.
- Each branch represents an outcome of the decision.
- Each leaf node represents a class label (classification) or a continuous value (regression).

**How it works:**
1. The dataset is split into subsets based on feature values.
2. Splitting continues recursively until a stopping condition is met (e.g., minimum samples per node, tree depth, etc.).
3. The best split is determined using impurity measures such as **Gini Impurity** or **Entropy**.

---

### **2. What are impurity measures in Decision Trees?**
Impurity measures determine how well a dataset is split into classes. The two main impurity measures are:
- **Gini Impurity**: Measures the probability of misclassification.
- **Entropy**: Measures the randomness in the dataset.

A pure node has an impurity measure of zero, meaning all samples belong to one class.

---

### **3. What is the mathematical formula for Gini Impurity?**
The **Gini Impurity** for a node is given by:

\[
Gini = 1 - \sum_{i=1}^{c} p_i^2
\]

Where:
- \( p_i \) is the probability of a sample belonging to class \( i \).
- \( c \) is the number of classes.

A lower Gini value indicates a purer node.

---

### **4. What is the mathematical formula for Entropy?**
The **Entropy** for a node is calculated as:

\[
Entropy = - \sum_{i=1}^{c} p_i \log_2 p_i
\]

Where:
- \( p_i \) is the probability of a sample belonging to class \( i \).
- \( c \) is the number of classes.

Entropy is highest when classes are evenly distributed and lowest when nodes are pure.

---

### **5. What is Information Gain, and how is it used in Decision Trees?**
**Information Gain (IG)** measures the reduction in impurity when a dataset is split. It is calculated as:

\[
IG = Entropy(parent) - \sum \left(\frac{|child|}{|parent|} \times Entropy(child)\right)
\]

A split with higher **Information Gain** is preferred, as it reduces uncertainty in the dataset.

---

### **6. What is the difference between Gini Impurity and Entropy?**
| **Gini Impurity** | **Entropy** |
|------------------|-----------|
| Measures misclassification probability | Measures randomness (uncertainty) |
| Computationally faster | More complex due to logarithms |
| Lower values indicate purer splits | Higher values indicate greater disorder |

Both lead to similar decision trees, but **Gini is computationally more efficient**.

---

### **7. What is the mathematical explanation behind Decision Trees?**
A Decision Tree recursively splits data using:
1. **Impurity Measures** (Gini or Entropy).
2. **Best Split Selection**:
   - The feature and threshold that maximize **Information Gain**.
3. **Stopping Conditions**:
   - Minimum samples per node, maximum depth, etc.
4. **Pruning Techniques**:
   - Pre-Pruning and Post-Pruning to prevent overfitting.

Mathematically, decision trees optimize:

\[
Split_{best} = \arg \max_{split} \text{Information Gain}
\]

---

### **8. What is Pre-Pruning in Decision Trees?**
**Pre-Pruning** (Early Stopping) limits tree growth **before** fully developing it by:
- Setting a **maximum depth**.
- Restricting the **minimum samples per split**.
- Limiting the **minimum impurity decrease**.

This prevents **overfitting** but may lead to **underfitting**.

---

### **9. What is Post-Pruning in Decision Trees?**
**Post-Pruning** removes unnecessary branches **after** the tree is fully grown. It works by:
1. Growing the tree completely.
2. Removing branches with little contribution using **Cost Complexity Pruning (CCP)**.

Post-pruning improves **generalization** but requires validation.

---

### **10. What is the difference between Pre-Pruning and Post-Pruning?**
| **Pre-Pruning** | **Post-Pruning** |
|---------------|---------------|
| Stops tree growth early | Prunes after full growth |
| May cause underfitting | Reduces overfitting |
| Computationally efficient | Requires additional validation |

Post-pruning is generally more effective but computationally expensive.

---

### **11. What is a Decision Tree Regressor?**
A **Decision Tree Regressor** predicts **continuous values** instead of class labels. It works similarly to classification trees but minimizes **Mean Squared Error (MSE)** instead of impurity measures.

---

### **12. What are the advantages and disadvantages of Decision Trees?**
✅ **Advantages:**
- Simple and easy to interpret.
- No need for feature scaling.
- Handles both categorical and numerical data.
- Can model non-linear relationships.

❌ **Disadvantages:**
- Prone to **overfitting** (especially deep trees).
- Sensitive to **noisy data**.
- High variance (small data changes may alter tree structure).

---

### **13. How does a Decision Tree handle missing values?**
Decision Trees handle missing values by:
- Ignoring missing values during splits.
- Using surrogate splits (alternative splits for missing data).
- Imputing missing values with the most frequent class (classification) or mean (regression).

---

### **14. How does a Decision Tree handle categorical features?**
Categorical features are handled by:
- **One-Hot Encoding** (if using algorithms like CART).
- **Label Encoding** (when dealing with ordinal data).
- **Binary splits** (splitting categories into groups).

---

### **15. What are some real-world applications of Decision Trees?**
✅ **Real-world applications:**
- **Healthcare**: Disease diagnosis.
- **Finance**: Loan approval predictions.
- **Marketing**: Customer segmentation.
- **Fraud Detection**: Identifying fraudulent transactions.
- **Manufacturing**: Predicting machine failures.



In [None]:
                                          PRACTICAL QUESTIONS 



### **16. Train a Decision Tree Classifier on the Iris dataset and print the model accuracy**
```python
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Predict and print accuracy
y_pred = clf.predict(X_test)
print("Model Accuracy:", accuracy_score(y_test, y_pred))
```

---

### **17. Train a Decision Tree Classifier using Gini Impurity and print feature importances**
```python
clf = DecisionTreeClassifier(criterion="gini")
clf.fit(X_train, y_train)

# Print feature importances
print("Feature Importances:", clf.feature_importances_)
```

---

### **18. Train a Decision Tree Classifier using Entropy and print the model accuracy**
```python
clf = DecisionTreeClassifier(criterion="entropy")
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print("Model Accuracy:", accuracy_score(y_test, y_pred))
```

---

### **19. Train a Decision Tree Regressor on a housing dataset and evaluate using MSE**
```python
from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree Regressor
regressor = DecisionTreeRegressor()
regressor.fit(X_train, y_train)

# Predict and evaluate
y_pred = regressor.predict(X_test)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
```

---

### **20. Train a Decision Tree Classifier and visualize it using Graphviz**
```python
from sklearn.tree import export_graphviz
import graphviz

dot_data = export_graphviz(clf, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names,
                           filled=True, rounded=True, special_characters=True)
graph = graphviz.Source(dot_data)
graph.render("decision_tree")  # Saves tree as a file
graph  # Display the tree
```

---

### **21. Train a Decision Tree Classifier with max depth = 3 and compare accuracy with a fully grown tree**
```python
clf_limited = DecisionTreeClassifier(max_depth=3)
clf_limited.fit(X_train, y_train)

y_pred_limited = clf_limited.predict(X_test)
print("Accuracy with max depth = 3:", accuracy_score(y_test, y_pred_limited))
print("Accuracy with fully grown tree:", accuracy_score(y_test, y_pred))
```

---

### **22. Train a Decision Tree Classifier using min_samples_split=5 and compare with default tree**
```python
clf_min_samples = DecisionTreeClassifier(min_samples_split=5)
clf_min_samples.fit(X_train, y_train)

y_pred_min_samples = clf_min_samples.predict(X_test)
print("Accuracy with min_samples_split=5:", accuracy_score(y_test, y_pred_min_samples))
print("Accuracy with default tree:", accuracy_score(y_test, y_pred))
```

---

### **23. Apply feature scaling before training a Decision Tree Classifier and compare accuracy**
```python
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

clf_scaled = DecisionTreeClassifier()
clf_scaled.fit(X_train_scaled, y_train)

y_pred_scaled = clf_scaled.predict(X_test_scaled)
print("Accuracy with scaling:", accuracy_score(y_test, y_pred_scaled))
print("Accuracy without scaling:", accuracy_score(y_test, y_pred))
```

---

### **24. Train a Decision Tree Classifier using One-vs-Rest (OvR) strategy for multiclass classification**
```python
from sklearn.multiclass import OneVsRestClassifier

ovr_clf = OneVsRestClassifier(DecisionTreeClassifier())
ovr_clf.fit(X_train, y_train)

y_pred_ovr = ovr_clf.predict(X_test)
print("Accuracy with OvR:", accuracy_score(y_test, y_pred_ovr))
```

---

### **25. Train a Decision Tree Classifier and display the feature importance scores**
```python
print("Feature Importances:", clf.feature_importances_)
```

---

### **26. Train a Decision Tree Regressor with max_depth=5 and compare with an unrestricted tree**
```python
regressor_limited = DecisionTreeRegressor(max_depth=5)
regressor_limited.fit(X_train, y_train)

y_pred_limited = regressor_limited.predict(X_test)
print("MSE with max_depth=5:", mean_squared_error(y_test, y_pred_limited))
print("MSE with unrestricted tree:", mean_squared_error(y_test, y_pred))
```

---

### **27. Train a Decision Tree Classifier, apply Cost Complexity Pruning (CCP), and visualize its effect on accuracy**
```python
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas = path.ccp_alphas

for alpha in ccp_alphas:
    pruned_clf = DecisionTreeClassifier(ccp_alpha=alpha)
    pruned_clf.fit(X_train, y_train)
    y_pred_pruned = pruned_clf.predict(X_test)
    print(f"Alpha: {alpha}, Accuracy: {accuracy_score(y_test, y_pred_pruned)}")
```

---

### **28. Train a Decision Tree Classifier and evaluate performance using Precision, Recall, and F1-Score**
```python
from sklearn.metrics import precision_score, recall_score, f1_score

print("Precision:", precision_score(y_test, y_pred, average='weighted'))
print("Recall:", recall_score(y_test, y_pred, average='weighted'))
print("F1 Score:", f1_score(y_test, y_pred, average='weighted'))
```

---

### **29. Train a Decision Tree Classifier and visualize the confusion matrix using seaborn**
```python
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, cmap="Blues", fmt="d", xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()
```

---

### **30. Train a Decision Tree Classifier and use GridSearchCV to find optimal values for max_depth and min_samples_split**
```python
from sklearn.model_selection import GridSearchCV

param_grid = {
    'max_depth': [3, 5, 10, None],
    'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

print("Best Parameters:", grid_search.best_params_)
print("Best Accuracy:", grid_search.best_score_)
```

