<a href="https://colab.research.google.com/github/tgarg535/Machine-Learning/blob/main/SVM%26NaiveBayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Theoretical Questions**

### 1. What is a Support Vector Machine (SVM)?

SVM is a supervised learning algorithm used for both classification and regression. Its primary goal is to find the **optimal hyperplane** in an N-dimensional space that distinctly classifies data points by maximizing the margin between classes.

### 2. What is the difference between Hard Margin and Soft Margin SVM?

* **Hard Margin:** Assumes the data is perfectly linearly separable. It allows zero misclassifications, which makes it very sensitive to outliers.
* **Soft Margin:** Introduces a "slack variable" that allows some points to be misclassified or fall inside the margin. This makes the model more robust to noise and overlapping data.

### 3. What is the mathematical intuition behind SVM?

The goal is to maximize the **Margin**, which is the distance between the hyperplane and the nearest data points.
Mathematically, we seek to maximize:



subject to the constraint that all points are correctly classified:


### 4. What is the role of Lagrange Multipliers in SVM?

Lagrange Multipliers are used to transform the constrained optimization problem into an unconstrained dual problem. This allows us to solve for the weight vector  using only the dot products of the input vectors, which is the foundation for the **Kernel Trick**.

### 5. What are Support Vectors in SVM?

Support vectors are the data points that lie closest to the decision boundary (hyperplane). They are the "critical" points; if they were moved, the position of the hyperplane would change.

### 6. What is a Support Vector Classifier (SVC) vs. Regressor (SVR)?

* **SVC:** Predicts discrete class labels by finding a hyperplane that maximizes separation.
* **SVR:** Predicts continuous values. Instead of a margin that stays "clear" of points, SVR tries to fit as many points as possible *within* a defined margin (epsilon-tube) around the line.

### 7. What is the Kernel Trick in SVM?

The Kernel Trick allows SVM to solve non-linear problems by mapping low-dimensional input space into a higher-dimensional feature space where a linear hyperplane can separate the data, without actually calculating the coordinates in that high-dimensional space.

### 8. Compare Linear, Polynomial, and RBF Kernels

| Kernel | Use Case | Complexity |
| --- | --- | --- |
| **Linear** | Linearly separable data; high-dimensional text data. | Simple, fast. |
| **Polynomial** | Images or data where feature interactions matter. | More complex, slower. |
| **RBF (Gaussian)** | Default for non-linear data; handles infinite dimensions. | Highly flexible but prone to overfitting. |

### 9. What is the effect of the C parameter?

* **Small C:** Large margin, allows more misclassifications (higher bias, lower variance).
* **Large C:** Small margin, aims for zero misclassifications (lower bias, higher variance/overfitting).

### 10. What is the role of the Gamma parameter in RBF?

* **Small Gamma:** Gaussian kernel has a large reach; even far-away points influence the boundary (smoother boundary).
* **Large Gamma:** Only points very close to the hyperplane influence it (wiggly, complex boundary).

---

## Part 2: Naïve Bayes Theory

### 11. What is Naïve Bayes and why is it "Naïve"?

It is a probabilistic classifier based on **Bayes' Theorem**. It is called "Naïve" because it makes the strong (and often unrealistic) assumption that all features are **independent** of each other given the class label.

### 12. What is Bayes’ Theorem?

It calculates the probability of an event based on prior knowledge of conditions:


### 13. Explain Naïve Bayes Variants

* **Gaussian:** Used when features follow a normal distribution (continuous data).
* **Multinomial:** Used for discrete counts (e.g., word counts in text classification).
* **Bernoulli:** Used for binary/boolean features (e.g., word presence vs. absence).

### 14. What are the key assumptions?

1. **Feature Independence:** Features do not influence each other.
2. **Equal Importance:** Each feature contributes equally to the outcome.

### 15. Advantages and Disadvantages

* **Pros:** Extremely fast, works well with high-dimensional data, requires less training data.
* **Cons:** The independence assumption rarely holds in the real world.

### 16. Why is it good for Text Classification?

Naïve Bayes handles high-dimensional sparse data (like word counts) efficiently. Even though words in a sentence aren't independent, the algorithm still captures the "essence" of the document class effectively.

### 17. How does Laplace Smoothing help?

If a word appears in the test set but not the training set, the probability becomes **zero**, which nullifies the entire calculation. Laplace Smoothing adds a small value (usually 1) to all counts to ensure no probability is ever exactly zero.







### 18. Why is Naïve Bayes a good choice for text classification?

Naïve Bayes is a staple for text classification (like spam detection or sentiment analysis) for several reasons:

* **High Dimensionality:** Text data often has thousands of features (unique words). Naïve Bayes handles high-dimensional sparse data very efficiently.
* **Decoupled Parameters:** It treats each word independently, meaning it only needs to estimate the probability of each word appearing in a class, rather than complex word combinations.
* **Speed:** Since it only involves simple counting and multiplication, it is incredibly fast to train and predict compared to iterative models.

### 19. Compare SVM and Naïve Bayes for classification tasks

| Feature | Naïve Bayes | SVM |
| --- | --- | --- |
| **Model Type** | Probabilistic (Generative) | Geometric (Discriminative) |
| **Speed** | Extremely fast | Slower on large datasets |
| **Linearity** | Linear by nature | Linear or Non-linear (via Kernels) |
| **Data Size** | Works well with small/medium data | Requires more tuning but scales well |
| **Assumptions** | Assumes feature independence | No strong statistical assumptions |

### 20. How does Laplace Smoothing help in Naïve Bayes?

In text data, if a word appears in the test set but was never seen in the training set for a specific class, the probability  becomes . Since Naïve Bayes multiplies these probabilities, one zero will make the entire class probability zero.
**Laplace Smoothing** adds a small positive value (usually ) to the numerator and adjusts the denominator to ensure no probability is ever exactly zero.

---

#**Practical Questions**

### SVM Hyperparameter Tuning (GridSearchCV)

```python
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2)

# Defining parameter range
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['rbf', 'poly', 'linear']
}

grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=1)
grid.fit(X_train, y_train)

print(f"Best Parameters: {grid.best_params_}")
print(f"Best Score: {grid.best_score_}")

```

### Comparing SVM Kernels (Breast Cancer Dataset)

```python
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

kernels = ['linear', 'poly', 'rbf']
for k in kernels:
    model = SVC(kernel=k, degree=3) # degree used for poly
    model.fit(X_train, y_train)
    pred = model.predict(X_test)
    print(f"Accuracy with {k} kernel: {accuracy_score(y_test, pred):.4f}")

```

### SVM Regressor (SVR) with MAE Evaluation

```python
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
X_h, y_h = housing.data[:1000], housing.target[:1000] # Subset for speed

svr = SVR(kernel='rbf')
svr.fit(X_h, y_h)
y_pred = svr.predict(X_h)

mae = mean_absolute_error(y_h, y_pred)
print(f"SVR Mean Absolute Error: {mae:.4f}")

```

### Visualizing Confusion Matrix and Precision-Recall Curve

```python
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, PrecisionRecallDisplay

# Confusion Matrix
y_pred = grid.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('SVM Confusion Matrix')
plt.show()

# Precision-Recall Curve
PrecisionRecallDisplay.from_estimator(grid, X_test, y_test)
plt.title("SVM Precision-Recall Curve")
plt.show()

```

### Handling Imbalanced Data with Class Weighting

```python
# 'balanced' mode automatically adjusts weights inversely proportional to class frequencies
model_weighted = SVC(kernel='linear', class_weight='balanced')
model_weighted.fit(X_train, y_train)
print(f"Weighted SVM Accuracy: {model_weighted.score(X_test, y_test):.4f}")

```

### Recursive Feature Elimination (RFE) for SVM

```python
from sklearn.feature_selection import RFE

svc_linear = SVC(kernel="linear")
selector = RFE(estimator=svc_linear, n_features_to_select=5, step=1)
selector = selector.fit(X_train, y_train)

print(f"Selected Features: {selector.support_}")
print(f"Feature Ranking: {selector.ranking_}")

```

---
