### **Q1. What is Lasso Regression, and how does it differ from other regression techniques?**

**Lasso Regression** (Least Absolute Shrinkage and Selection Operator) is a type of **linear regression** that includes a **regularization term** to prevent overfitting and improve model interpretability.

- The loss function for Lasso is:

$$[
\text{Loss} = \text{RSS} + \lambda \sum_{j=1}^{p} |\beta_j|
]$$

Where:
- RSS = Residual Sum of Squares
- $( \lambda )$ = regularization parameter (controls the strength of penalty)
- $( \beta_j )$ = coefficients of predictors

---

#### **How it differs from other regression techniques:**

- Unlike **Ordinary Least Squares (OLS)**, Lasso includes a **penalty term** to shrink coefficients.
- Unlike **Ridge Regression**, Lasso uses **L1 regularization** (absolute values), which can shrink some coefficients **exactly to zero**, effectively **eliminating features**.
- This makes Lasso not only a **regression technique** but also a **feature selection method**.

---

### **Q2. What is the main advantage of using Lasso Regression in feature selection?**

The key advantage of Lasso Regression is its ability to perform **automatic feature selection**.

- Due to the **L1 penalty**, Lasso can **shrink some coefficients to exactly zero**.
- This means Lasso **removes irrelevant or less important features** from the model entirely.
- As a result, Lasso produces **sparse models**, which are easier to interpret and often perform better, especially when dealing with **high-dimensional data** (many features).

---

### **Q3. How do you interpret the coefficients of a Lasso Regression model?**

In Lasso Regression:

- A **non-zero coefficient** means the corresponding feature contributes to the prediction.
- A **zero coefficient** means the feature has been **excluded** from the model (considered unimportant or redundant).
- The magnitude and sign (positive or negative) of each **non-zero coefficient** indicate:
  - The **strength** of the relationship.
  - The **direction** (positive or negative effect) on the target variable.

Note: Coefficients are **biased** due to the penalty — they are shrunk, so their absolute values are smaller than they would be in OLS.

---

### **Q4. What are the tuning parameters that can be adjusted in Lasso Regression, and how do they affect the model's performance?**

The **main tuning parameter** in Lasso Regression is:

####  **Lambda ( $( \lambda )$ )**
- Controls the **amount of regularization**.
- **High $( \lambda )$**: Strong regularization → more coefficients shrink to zero → simpler model, risk of underfitting.
- **Low $( \lambda )$**: Less regularization → more features kept → model might overfit.

Other parameters (depending on implementation/library):
- `max_iter`: Maximum number of iterations for optimization.
- `tol`: Tolerance for stopping criteria.
- `alpha` (in many libraries like scikit-learn): Equivalent to $( \lambda )$.

---

### **Q5. Can Lasso Regression be used for non-linear regression problems? If yes, how?**

Yes, **Lasso Regression can be used for non-linear problems**, but not **directly**.

####  **How:**
- **Transform the data** using **feature engineering** (e.g., polynomial features, interaction terms, splines).
- Use Lasso on the transformed data to:
  - Fit a linear model in the higher-dimensional space.
  - Still benefit from feature selection and regularization.

Alternatively:
- Use **Lasso in combination with kernel methods** or non-linear models (e.g., in pipeline setups).

> Note: Lasso is still a **linear model**, but it can approximate non-linear relationships **after transforming features**.

---

### **Q6. What is the difference between Ridge Regression and Lasso Regression?**

| Aspect                 | Ridge Regression                         | Lasso Regression                         |
|------------------------|-------------------------------------------|------------------------------------------|
| Regularization Type    | L2 (squares of coefficients)              | L1 (absolute values of coefficients)      |
| Feature Selection      | ❌ Keeps all features                     | ✅ Shrinks some coefficients to zero      |
| Model Sparsity         | No (dense model)                         | Yes (sparse model)                        |
| Handles Multicollinearity | ✅ Good                                | ✅ Good, but may arbitrarily choose among correlated features |
| Best For               | Many small/medium effects                | Few strong effects                        |

---

### **Q7. Can Lasso Regression handle multicollinearity in the input features? If yes, how?**

Yes, **Lasso Regression can handle multicollinearity**, but in a **different way** than Ridge:

- Lasso tends to **select only one feature** from a group of **highly correlated features**, and **set the others to zero**.
- This leads to **simpler models**, but the choice of which feature to keep may be **somewhat arbitrary**.
- In contrast, Ridge keeps all correlated features but **distributes the effect** among them.

---

### **Q8. How do you choose the optimal value of the regularization parameter (lambda) in Lasso Regression?**

You can choose the optimal $( \lambda )$ using:

####  **1. Cross-Validation (most common)**
- Perform **K-fold cross-validation** to test multiple values of $( \lambda )$.
- Choose the value that **minimizes the cross-validation error** (e.g., MSE).

####  **2. Grid Search or Random Search**
- Use `GridSearchCV` or `RandomizedSearchCV` to test a predefined set or random range of $( \lambda )$ values.

####  **3. Information Criteria**
- Use **AIC** or **BIC** to select $( \lambda )$ based on model complexity and goodness of fit.

####  **4. Regularization Path (LARS / Coordinate Descent)**
- Analyze how coefficients change as $( \lambda )$ increases.
- Choose a balance between model sparsity and performance.

---

### **Conclusion:**

Lasso Regression is powerful for both **prediction and feature selection**, especially in high-dimensional datasets. Its ability to **shrink and eliminate** features makes it an ideal tool for interpretable, efficient modeling — but careful tuning of \( \lambda \) is crucial for best performance.