R-squared (Coefficient of Determination) measures the proportion of the variance in the dependent variable that is explained by the independent variables in the model.

📏 Formula:
𝑅
2
=
1
−
𝑆
𝑆
res
𝑆
𝑆
tot
R 
2
 =1− 
SS 
tot
​
 
SS 
res
​
 
​
 
Where:

𝑆
𝑆
res
SS 
res
​
  = Sum of Squares of Residuals = 
∑
(
𝑦
𝑖
−
𝑦
^
𝑖
)
2
∑(y 
i
​
 − 
y
^
​
  
i
​
 ) 
2
 

𝑆
𝑆
tot
SS 
tot
​
  = Total Sum of Squares = 
∑
(
𝑦
𝑖
−
𝑦
ˉ
)
2
∑(y 
i
​
 − 
y
ˉ
​
 ) 
2
 

🎯 Interpretation:
𝑅
2
=
0
R 
2
 =0: Model explains none of the variance.

𝑅
2
=
1
R 
2
 =1: Model explains all the variance.

Example: 
𝑅
2
=
0.85
R 
2
 =0.85 → 85% of the variability in 
𝑦
y is explained by the model.

⚠️ Note:
R-squared can increase with more predictors — even if they aren't meaningful.

Q2. What Is Adjusted R-squared and How Is It Different?
Adjusted R-squared compensates for the fact that R-squared always increases with more variables. It penalizes for unnecessary predictors.

🧮 Formula:
Adjusted 
𝑅
2
=
1
−
(
(
1
−
𝑅
2
)
(
𝑛
−
1
)
𝑛
−
𝑘
−
1
)
Adjusted R 
2
 =1−( 
n−k−1
(1−R 
2
 )(n−1)
​
 )
Where:

𝑛
n = number of observations

𝑘
k = number of predictors

🚀 Why It's Better:
Reflects the true explanatory power of the model.

Can decrease when irrelevant variables are added.

Q3. When Should You Use Adjusted R-squared?
Use Adjusted R-squared when:

You are working with multiple predictors.

You're comparing different models with a different number of predictors.

You want a more reliable measure of model quality.

R-squared is fine for simple linear regression (1 predictor). But when you have multiple variables, adjusted R-squared gives a truer picture.

Q4. RMSE, MSE, MAE — What Do They Mean in Regression?
These are error metrics to evaluate how well your regression model is performing.

1. MSE (Mean Squared Error):
MSE
=
1
𝑛
∑
(
𝑦
𝑖
−
𝑦
^
𝑖
)
2
MSE= 
n
1
​
 ∑(y 
i
​
 − 
y
^
​
  
i
​
 ) 
2
 
Squares the errors — penalizes large errors more heavily.

Used for optimization (e.g., in gradient descent).

2. RMSE (Root Mean Squared Error):
RMSE
=
MSE
RMSE= 
MSE
​
 
Same units as the target variable → easier to interpret.

Popular for evaluating regression models.

3. MAE (Mean Absolute Error):
MAE
=
1
𝑛
∑
∣
𝑦
𝑖
−
𝑦
^
𝑖
∣
MAE= 
n
1
​
 ∑∣y 
i
​
 − 
y
^
​
  
i
​
 ∣
Measures average magnitude of errors, without squaring.

Less sensitive to outliers than MSE/RMSE.

🧠 Quick Summary Table:
Metric	Penalizes Large Errors?	Same Units as Target?	Outlier Sensitivity
MSE	Yes (squared error)	No	High
RMSE	Yes	Yes	High
MAE	No	Yes	Lower


---

### **Q5. Advantages and Disadvantages of RMSE, MSE, and MAE**

| Metric | Advantages | Disadvantages |
|--------|------------|---------------|
| **MSE** | - Useful for penalizing large errors<br>- Mathematically convenient (used in optimization algorithms)<br>- Differentiable (good for gradient descent) | - Not in same unit as target<br>- Heavily influenced by outliers |
| **RMSE** | - Same unit as target variable (more interpretable)<br>- Penalizes large errors (more sensitive) | - Still very sensitive to outliers<br>- Harder to optimize than MSE |
| **MAE** | - Robust to outliers<br>- Same unit as target<br>- Easy to interpret as average error | - Not differentiable at zero (can be an issue in optimization)<br>- Doesn’t emphasize large errors (which might matter in some domains) |

✅ **Use MAE** when:
- You want a **robust** metric.
- You care equally about all errors.

✅ **Use RMSE or MSE** when:
- You want to **emphasize large errors** (e.g., in medical or financial predictions).

---

### **Q6. What Is Lasso Regularization? How Does It Differ from Ridge?**

**Lasso (Least Absolute Shrinkage and Selection Operator)** adds an **L1 penalty** to the loss function, which can shrink coefficients to **exactly zero**.

#### 🔢 Lasso Cost Function:
\[
\text{Loss} = \text{RSS} + \lambda \sum |\beta_i|
\]

**Ridge Regression** adds an **L2 penalty** — it shrinks coefficients **toward zero**, but not exactly zero.

#### 🔢 Ridge Cost Function:
\[
\text{Loss} = \text{RSS} + \lambda \sum \beta_i^2
\]

#### 🎯 Key Difference:
- **Lasso**: Can **select variables** by shrinking some coefficients to zero → **feature selection**.
- **Ridge**: Keeps all variables but reduces their influence.

---

### **Q7. How Do Regularized Linear Models Prevent Overfitting?**

Overfitting happens when a model learns noise in the training data. Regularization **constrains model complexity** by penalizing large coefficients.

#### 🔁 Example:
Suppose you're predicting house prices using 50 features.

- **Linear Regression**: May overfit by assigning large weights to unimportant variables.
- **Ridge/Lasso**: Shrinks coefficients → forces model to generalize better.

**Visual Analogy**: Think of regularization as adding “friction” so the model can’t make wild swings (huge weights) just to fit every single point.

---

### **Q8. Limitations of Regularized Linear Models**

#### ❌ Limitations:
1. **Linearity Assumption**: They still assume a linear relationship.
2. **Feature Engineering Needed**: Can’t capture complex interactions or nonlinear patterns on their own.
3. **Hyperparameter Tuning**: The regularization strength \( \lambda \) must be carefully chosen (e.g., via cross-validation).
4. **Lasso Limit**: When predictors are highly correlated, Lasso may arbitrarily pick one and drop the others.

#### 🧠 When Not to Use:
- If your data has strong **non-linear relationships** → consider tree-based models (e.g., Random Forest, XGBoost).
- If interpretability isn't a priority and performance is more important → use more complex models like neural networks.

---
