### Q1. Explain the concept of R-squared in linear regression models. How is it calculated, and what does it represent?


R-Squared (R² or the coefficient of determination) is a statistical measure in a regression model that determines the proportion of variance in the dependent variable that can be explained by the independent variable. In other words, r-squared shows how well the data fit the regression model (the goodness of fit).

The formula for calculating R-squared is:

![Screenshot 2023-04-15 082959.png](attachment:6a16be01-63b0-4230-b5c3-0597a94ca474.png)

Where:

SSregression is the sum of squares due to regression (explained sum of squares)
SStotal is the total sum of squares

The most common interpretation of r-squared is how well the regression model explains observed data. For example, an r-squared of 60% reveals that 60% of the variability observed in the target variable is explained by the regression model. Generally, a higher r-squared indicates more variability is explained by the model.

### Q2. Define adjusted R-squared and explain how it differs from the regular R-squared.


The adjusted R-squared is a modified version of R-squared that accounts for predictors that are not significant in a regression model.

In other words, the adjusted R-squared shows whether adding additional predictors improve a regression model or not.

So, if R-squared does not increase significantly on the addition of a new independent variable, then the value of Adjusted R-squared will actually decrease.

### Q3. When is it more appropriate to use adjusted R-squared?


It is better to use Adjusted R-squared when there are multiple variables in the regression model. This would allow us to compare models with differing numbers of independent variables.

### Q4. What are RMSE, MSE, and MAE in the context of regression analysis? How are these metrics calculated, and what do they represent?


* **MAE** evaluates the absolute distance of the observations (the entries of the dataset) to the predictions on a regression, taking the average over all observations. We use the absolute value of the distances so that negative errors are accounted properly.

![1_V5kDRZyG6udD7tKStFotlQ.webp](attachment:d5f2b8b2-485a-4e2d-b7a4-b4181e89ccf3.webp)

* Another way to do so is by squaring the distance, so that the results are positive. This is done by the **MSE**, and higher errors (or distances) weigh more in the metric than lower ones.

![1_VZ3iKUrzH7f71B03Sui6fQ.webp](attachment:1ccf3641-4f4d-4f42-bfe8-cebc1eaee423.webp)

* **RMSE** is used then to return the MSE error to the original unit by taking the square root of it, while maintaining the property of penalizing higher errors.

![1_YDOgUMYYjETdAVAn7xEOrw.webp](attachment:fa8b43cd-9645-46a1-ba2f-f25e4156ff67.webp)

### Q5. Discuss the advantages and disadvantages of using RMSE, MSE, and MAE as evaluation metrics in regression analysis.


`MSE`

| Advantages  | Disadvantages |
|--------|-------|
|1. Equation is differentiable as it is quadratic eqn. | 1. Not robust to outliers. |
| 2. It has only one local or global minima. | 2. It is not in the same unit. |


`MAE`

| Advantages  | Disadvantages |
|--------|-------|
|1. Robust to outliers. | 1.Convergence usually takes more time |
| 2. It will be in the same unit. | |

`RMSE`

| Advantages  | Disadvantages |
|--------|-------|
|1. It is in the same unit. | 1. Not robust to outliers. |
| 2.It is differentiable. |  |

### Q6. Explain the concept of Lasso regularization. How does it differ from Ridge regularization, and when is it more appropriate to use?


Lasso regression is a regularization technique. It is used over regression methods for a more accurate prediction. This model uses shrinkage. Shrinkage is where data values are shrunk towards a central point as the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.

It is used when we have more features because it automatically performs feature selection.

L1 regularization adds a penalty that is equal to the absolute value of the magnitude of the coefficient. This regularization type can result in sparse models with few coefficients. Some coefficients might become zero and get eliminated from the model. Larger penalties result in coefficient values that are closer to zero (ideal for producing simpler models). On the other hand, L2 regularization does not result in any elimination of sparse models or coefficients. Thus, Lasso Regression is easier to interpret as compared to the Ridge. 

### Q7. How do regularized linear models help to prevent overfitting in machine learning? Provide an example to illustrate.


Regularized linear models, such as Ridge Regression and Lasso Regression, are techniques used to prevent overfitting in machine learning by adding a regularization term to the loss function. The regularization term controls the complexity of the model and helps to constrain the coefficients or weights of the features, thereby reducing overfitting. Here's an example to illustrate how regularized linear models work:

Let's consider a dataset with a single feature (x) and a target variable (y). We want to fit a linear regression model to this data. Without regularization, a simple linear regression model tries to minimize the mean squared error (MSE) between the predicted values and the actual values of the target.

However, if the dataset is small, noisy, or contains many irrelevant features, a simple linear regression model can overfit the data by fitting the noise or capturing spurious patterns. This can lead to poor generalization on unseen data.

To address this, we can apply regularization techniques:

**1. Ridge Regression:** Ridge Regression adds a penalty term that is proportional to the sum of the squared magnitudes of the coefficients. This penalty term is multiplied by a hyperparameter called the regularization parameter (alpha). The higher the alpha value, the stronger the regularization effect. Ridge Regression tries to minimize the following loss function:

Loss = MSE + alpha * (sum of squared coefficients)

By increasing the penalty for large coefficient values, Ridge Regression shrinks the coefficients towards zero, making the model less sensitive to individual features and reducing overfitting.

**2. Lasso Regression:** Lasso Regression, similar to Ridge Regression, adds a penalty term to the loss function. However, instead of using the sum of squared coefficients, Lasso Regression uses the sum of the absolute values of the coefficients multiplied by the regularization parameter (alpha). The loss function for Lasso Regression is:

Loss = MSE + alpha * (sum of absolute coefficients)

Lasso Regression has a unique property where it can shrink some coefficients all the way to zero, effectively performing feature selection. By setting some coefficients to zero, Lasso Regression can automatically eliminate irrelevant 

### Q8. Discuss the limitations of regularized linear models and explain why they may not always be the best choice for regression analysis.


Ans - While regularized linear models have their benefits in preventing overfitting, they also have certain limitations that may make them less suitable for certain regression analysis tasks. Here are some limitations of regularized linear models:

**1. Linearity Assumption:** Regularized linear models assume a linear relationship between the features and the target variable. However, in real-world scenarios, the relationship between variables may not be strictly linear. If the underlying relationship is highly non-linear or contains complex interactions, linear models may not capture the nuances effectively, leading to suboptimal performance.

**2. Feature Scaling Sensitivity:** Regularized linear models are sensitive to the scale of the features. If the features have different scales, the regularization term may penalize certain features more than others, affecting the model's performance. It is important to scale the features appropriately before applying regularized linear models.

**3. Feature Importance Interpretation:** Regularized linear models can be less interpretable compared to non-regularized linear models. The regularization process can shrink the coefficients towards zero, making it difficult to determine the relative importance of different features in the model. This can be a drawback if you require clear feature importance insights.

**4. Limited Feature Selection:** While Lasso Regression can perform automatic feature selection by setting some coefficients to zero, it may not always select the "right" features. If the relationship between the features and the target variable is not strictly linear, Lasso Regression may eliminate potentially relevant features, resulting in underfitting or loss of important information.

**5. Hyperparameter Tuning:** Regularized linear models require tuning the regularization parameter (alpha) to achieve the right balance between bias and variance. Selecting the optimal value of alpha can be challenging and may require cross-validation or other techniques. If the alpha value is not chosen carefully, it can lead to an under-regularized or over-regularized model, impacting its performance.

### Q9. You are comparing the performance of two regression models using different evaluation metrics. Model A has an RMSE of 10, while Model B has an MAE of 8. Which model would you choose as the better performer, and why? Are there any limitations to your choice of metric?


Ans - The RMSE and MAE are both commonly used evaluation metrics for regression tasks, but they measure different aspects of model performance:

RMSE: The RMSE is the square root of the average of the squared differences between the predicted values and the actual values. It penalizes larger errors more heavily due to the squared term in the calculation. RMSE gives an idea of the average magnitude of the errors in the predicted values.

MAE: The MAE is the average of the absolute differences between the predicted values and the actual values. It treats all errors equally without considering the direction or magnitude of the errors. MAE provides a measure of the average absolute deviation between the predicted and actual values.

In the given scenario, Model B has a lower MAE (8) compared to Model A's RMSE (10). Therefore, based on the MAE metric alone, Model B can be considered the better performer as it has a smaller average absolute deviation from the actual values.

However, it is important to consider the limitations of each metric. The RMSE gives more weight to large errors, which can be useful if large errors are more critical or need to be minimized. On the other hand, the MAE treats all errors equally and may be more robust to outliers or extreme values.

### Q10. You are comparing the performance of two regularized linear models using different types of regularization. Model A uses Ridge regularization with a regularization parameter of 0.1, while Model B uses Lasso regularization with a regularization parameter of 0.5. Which model would you choose as the better performer, and why? Are there any trade-offs or limitations to your choice of regularization method?

Ans - In comparing Model A (Ridge regularization with a regularization parameter of 0.1) and Model B (Lasso regularization with a regularization parameter of 0.5), the choice depends on the specific requirements of the problem:

- Ridge Regularization (Model A): Ridge regularization generally performs well when there are many features with small to moderate effect sizes, and it is desired to reduce their collective impact. Ridge regression tends to keep all the features in the model but with smaller coefficients. It can help with reducing the impact of noise and multicollinearity.

- Lasso Regularization (Model B): Lasso regularization is suitable when there is a need for feature selection, as it tends to set some coefficients exactly to zero. Lasso regression can be effective in scenarios where only a subset of features has a significant impact, potentially improving model interpretability by identifying the most relevant features.

Trade-offs and limitations:

* Ridge regularization may not perform well when there are many irrelevant features or a need for feature selection. It shrinks the coefficients towards zero but does not eliminate any features entirely. Thus, it might not be the best choice if you need explicit feature selection.

* Lasso regularization can perform feature selection by setting some coefficients to zero. However, if there is high multicollinearity among features, Lasso may arbitrarily select one feature over another, leading to instability or inconsistent results.

In summary, the choice between Model A (Ridge regularization) and Model B (Lasso regularization) depends on the specific objectives and requirements of the problem. If the focus is on reducing the collective impact of all features, Ridge regularization (Model A) may be preferred. If feature selection is a priority, Lasso regularization (Model B) can be more suitable. It's important to consider the trade-offs and limitations of each regularization method based on the specific characteristics of the dataset and the goals of the analysis.