
---

### **Q1. What Is Lasso Regression and How Does It Differ from Other Regression Techniques?**

**Lasso Regression (Least Absolute Shrinkage and Selection Operator)** is a **linear regression model** that uses **L1 regularization** to penalize the absolute size of the coefficients.

#### 🧮 Loss Function:
\[
\text{Loss} = \sum (y_i - \hat{y}_i)^2 + \lambda \sum |\beta_i|
\]

#### 🔍 Key Differences:
- **Lasso vs. OLS**: Lasso adds a penalty term, OLS does not.
- **Lasso vs. Ridge**: 
  - Ridge uses L2 penalty (squares of coefficients) → shrinks coefficients, but doesn’t make them zero.
  - **Lasso can shrink coefficients to exactly zero**, effectively **removing features**.

---

### **Q2. What Is the Main Advantage of Using Lasso Regression in Feature Selection?**

✅ **Automatic feature selection.**

- Lasso can set **some coefficients exactly to 0**, which means those features are **excluded** from the model.
- Helps simplify models and reduce overfitting, especially when:
  - You have **many predictors**.
  - You suspect some predictors are **irrelevant or redundant**.

> 🧠 Think of Lasso as a model that **learns which features to ignore** — very useful when working with high-dimensional datasets.

---

### **Q3. How Do You Interpret the Coefficients of a Lasso Regression Model?**

- **Non-zero coefficient**: That feature is contributing to the prediction. You can interpret it similarly to OLS (change in target per 1-unit change in feature).
- **Zero coefficient**: That feature was deemed unimportant and **excluded** from the model.
- **Smaller magnitude**: Feature has weaker influence (but is still in the model).

⚠️ Reminder: Since Lasso adds regularization, the coefficients are **biased** (shrunken) and shouldn’t be interpreted the same way as OLS for statistical inference.

---

### **Q4. What Are the Tuning Parameters in Lasso Regression and How Do They Affect Performance?**

The primary tuning parameter in Lasso is:

### 🔧 **Lambda (also called alpha in `scikit-learn`)**:
Controls the strength of the L1 penalty.

- **Small λ (close to 0)** → Less regularization → More features included.
- **Large λ** → Stronger penalty → More coefficients shrink to zero → More feature selection.

#### 🧪 How to Choose λ:
- Use **cross-validation** (e.g., `LassoCV` in `scikit-learn`).
- Common to try a range of values on a **log scale**.

```python
from sklearn.linear_model import LassoCV
model = LassoCV(cv=5).fit(X_train, y_train)
print("Best alpha:", model.alpha_)
```

---



---

### **Q5. Can Lasso Regression Be Used for Non-Linear Regression Problems?**

✅ **Yes — but not directly.**

Lasso itself is a **linear model**, but you can handle **non-linear relationships** by **transforming your features** before applying Lasso.

#### 🔧 How to make Lasso work for non-linear problems:
1. **Add polynomial features**:
   - E.g., instead of just `x`, include `x²`, `x³`, etc.
   - Use `PolynomialFeatures` from `sklearn.preprocessing`.

2. **Use interaction terms**:
   - Include terms like `x1 * x2` to capture interaction effects.

3. **Use kernel tricks or basis expansions**:
   - Custom transformations that capture curvature.

> After transforming, apply **Lasso Regression** on the new feature space. It will still do feature selection, even with polynomial terms.

---

### **Q6. What’s the Difference Between Ridge Regression and Lasso Regression?**

| Feature | **Ridge Regression** | **Lasso Regression** |
|--------|----------------------|----------------------|
| **Penalty** | L2 (squared coefficients) | L1 (absolute values) |
| **Effect** | Shrinks coefficients | Shrinks & sets some to 0 |
| **Feature Selection** | ❌ No (keeps all) | ✅ Yes (automatic selection) |
| **Best for** | Multicollinearity, all features useful | High-dimensional data, feature reduction |
| **Behavior with correlated features** | Distributes weight | Selects one, drops others |

---

### **Q7. Can Lasso Handle Multicollinearity in Input Features?**

✅ Yes, but differently than Ridge.

- **Ridge**: Shares the weight across correlated features (keeps them all with smaller coefficients).
- **Lasso**: Often **picks one** feature and **sets the rest to zero**.

This is helpful for interpretability but can be **unstable** — small data changes might make it choose a different variable from a correlated group.

> For a more balanced approach, **Elastic Net** (L1 + L2) is great — it handles multicollinearity better than Lasso alone.

---

### **Q8. How Do You Choose the Optimal λ (Lambda) in Lasso Regression?**

Just like Ridge, the key is **cross-validation**.

#### 🔍 Steps:
1. Use `LassoCV` or `GridSearchCV` to search over a range of lambda values.
2. Evaluate using a performance metric (e.g., RMSE or MAE).
3. Select the λ that **minimizes validation error**.

#### Example in Python:
```python
from sklearn.linear_model import LassoCV

lasso = LassoCV(alphas=[0.01, 0.1, 1, 10, 100], cv=5)
lasso.fit(X_train, y_train)
print("Best lambda (alpha):", lasso.alpha_)
```

✅ **Tip**: Use a **logarithmic scale** for alphas — e.g., from `1e-4` to `1e2`.

---