In [None]:
"""
Q1. In order to predict house price based on several characteristics, such as location, square footage,
number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this
situation would be the best to employ?
"""


###  **Best Regression Metrics to Use:**

#### **1. Mean Absolute Error (MAE)**
- **Definition**: Average of the absolute differences between predicted and actual values.
- **Why it's good**: Easy to interpret (in same units as house price), **less sensitive to outliers** than MSE.
- **Use case**: When you want to know the average prediction error in real currency terms.

```python
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_true, y_pred)
```

---

#### **2. Mean Squared Error (MSE)**
- **Definition**: Average of squared differences between predictions and actual values.
- **Why it's good**: Penalizes large errors more (useful if large errors are costly).
- **Use case**: When large prediction errors are especially undesirable.

```python
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, y_pred)
```

---

#### **3. Root Mean Squared Error (RMSE)**
- **Definition**: Square root of MSE.
- **Why it's good**: Keeps units consistent with the target (house price), still penalizes large errors.
- **Use case**: Very commonly used in real estate models.

```python
rmse = mean_squared_error(y_true, y_pred, squared=False)
```

---

#### **4. R-squared (Coefficient of Determination)**
- **Definition**: Proportion of variance in the dependent variable that is predictable from the features.
- **Why it's good**: Tells how well your model explains variability in house prices.
- **Use case**: Good for understanding overall fit, but **not ideal for comparing individual errors**.

```python
from sklearn.metrics import r2_score
r2 = r2_score(y_true, y_pred)
```

---

###  **Recommended Metric for House Price Prediction**
-  **MAE** is generally the best metric for house price predictions if you want an interpretable average error.
-  Use **RMSE** if large price errors are more problematic (e.g., over/under-pricing houses).
- ℹ **R²** is good for comparing models, but not always reliable alone.

### Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price of a house as accurately as possible?
Ans: \
###  **Use Mean Squared Error (MSE)**

####  Why MSE is more appropriate:
- MSE measures the **average squared difference** between the predicted prices and the actual prices.
- It directly tells you **how far off your predictions are**, on average, from the true values (emphasizing larger errors).
- Since you're focused on **accuracy of actual price values**, this is exactly what MSE captures.

---

###  Why not just R-squared:
- R² measures **how well the model explains variance**, but it **doesn’t tell you how close** your predictions are to the actual price.
- You could have a high R² and still have large errors in prediction if the scale of predictions is off.
- It's better for **model comparison**, not for evaluating **absolute accuracy**.

---

###  Conclusion:
>  **Use MSE (or RMSE)** if your goal is **price accuracy in real numbers**.  
> Use R² **in addition** to see how well your model explains the data, but **not alone** for price accuracy.

### Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate regression metric to use with your SVM model. Which metric would be the most appropriate in this scenario?
Ans: \

###  **Mean Absolute Error (MAE)**

---

###  Why MAE is the best choice:
- **MAE** calculates the average of the **absolute errors** between predicted and actual values.
- It **treats all errors equally**, meaning **outliers won't be excessively penalized**.
- Unlike **MSE**, it does **not square** the errors, so extreme values (outliers) have **less impact** on the overall error.

---

### Why not MSE or RMSE:
- **MSE** and **RMSE** **square the errors**, which gives **extra weight** to large errors caused by outliers.
- These metrics tend to **overemphasize outliers**, potentially distorting your model evaluation.

---

###  Summary of Metric Behavior with Outliers:

| Metric | Sensitive to Outliers? | Suitable for Outlier-Rich Data? |
|--------|------------------------|-------------------------------|
| MAE    | ❌ No                   | ✅ Yes                         |
| MSE    | ✅ Yes                  | ❌ No                          |
| RMSE   | ✅ Yes (but interpretable) | ❌ No                     |
| R²     | ✅ Yes (indirectly)      | ❌ No (may drop sharply)       |

---

###  Conclusion:
>  **Use MAE** when your dataset contains outliers, and you want a **robust, balanced evaluation** of your regression model.

### Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values are very close. Which metric should you choose to use in this case?
Ans: \

###  **Will choose RMSE (Root Mean Squared Error)**

####  Why?
- **RMSE is in the same unit** as the target variable (e.g., dollars, square feet, etc.).
- It's **easier to interpret** than MSE, which is in **squared units**.
- When MSE and RMSE are close, it simply means your errors are consistent and not too large—so RMSE offers a clearer, more human-readable number.

---

###  Quick Comparison:

| Metric | Unit         | Interpretability | Penalizes Large Errors |
|--------|--------------|------------------|-------------------------|
| MSE    | Squared unit | ❌ Hard to read   | ✅ Strongly              |
| RMSE   | Original unit| ✅ Easy to read   | ✅ Strongly              |

---

###  Conclusion:
>  Use **RMSE** when both values are close, because it's **more intuitive** and helps **communicate results better** to stakeholders (e.g., "our model predicts house prices with an average error of $4,000").

### Q5. You are comparing the performance of different SVM regression models using different kernels (linear, polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most appropriate if your goal is to measure how well the model explains the variance in the target variable?
Ans: \

###  **R-squared (R² Score)** – also known as the **coefficient of determination**

---

###  Why R² is the right choice:
- R² tells you the **proportion of the variance** in the target variable that is **explained by your model**.
- It is a **relative measure** — perfect for **comparing models** using different kernels.
- R² ranges from:
  - **1.0** → perfect prediction
  - **0.0** → model does no better than predicting the mean
  - **< 0.0** → model is worse than predicting the mean

---

###  Example:
Let’s say you're comparing models:

| Kernel      | R² Score |
|-------------|----------|
| Linear      | 0.72     |
| Polynomial  | 0.84     |
| RBF         | 0.87     |

Here, the **RBF kernel** explains the most variance in the target — so it's the best fit **in terms of capturing the pattern in the data**.

---

###  When **not** to use R²:
- If you're concerned with **absolute accuracy** (e.g., exact price predictions), then **MAE or RMSE** would be better.
- R² doesn't tell you how **far off** the predictions are — only how **well the model captures variance**.

---

###  Conclusion:
> If your **goal is to measure explanatory power**, go with **R² score** — it’s ideal for comparing different kernel models.