## Q1. Explain the concept of R-squared in linear regression models. How is it calculated, and what does it represent?

R-squared measures how well the linear regression model fits the data. It ranges from 0 to 1, with a higher value indicating a better fit. It can be calculated using the following formaul

- R-squared = 1 - (sum of squared residuals / total sum of squares)

    - sum of squared residuals is the sum of the squared differences between the predicted values and the actual values.         
    - total sum of squares is the sum of the squared differences between the actual values and the mean of the dependent variable

## Q2. Define adjusted R-squared and explain how it differs from the regular R-squared.

Adjusted R-squared is a modified version of the regular R-squared that takes into account the number of independent variables in a linear regression model. While R-squared represents the proportion of variance in the dependent variable that is explained by the independent variables, adjusted R-squared adjusts this measure for the number of independent variables in the model.

Adjusted R-squared is calculated using the following formula:

Adjusted R-squared = 1 - [(1 - R-squared) * (n - 1) / (n - k - 1)]

R-squared is the regular coefficient of determination
n is the sample size
k is the number of independent variables

## Q3. When is it more appropriate to use adjusted R-squared?

It is more appropriate to use when the model has a large number of independent variables. This is because as the number of independent variables increases, the R-squared will also increase, even if the model does not actually fit the data better. Adjusted R-squared helps to prevent this overfitting by adjusting the R-squared for the number of independent variables in the model.

## Q4. What are RMSE, MSE, and MAE in the context of regression analysis? How are these metrics calculated, and what do they represent?

RMSE, MSE, and MAE are metrics commonly used to evaluate the performance of regression models. 

    MSE is calculated as the average of the squared differences between the predicted and actual values. A lower MSE indicates a better fit. 

    RMSE is the square root of the MSE, which gives the error in the same units as the target variable, and is commonly used to compare the performance of different models. 

    MAE is calculated as the average of the absolute differences between the predicted and actual values. It represents the average magnitude of the errors in the predictions, and is less sensitive to outliers compared to MSE and RMSE. 

In general, RMSE, MSE, and MAE indicate the error of the model. Lower value indicates better performance of the model in predicting the target variable.

## Q5. Discuss the advantages and disadvantages of using RMSE, MSE, and MAE as evaluation metrics in regression analysis.

RMSE 
advantages
    - it has the same unit of measurements as the target variable, easier to interpret
    - it is differentiable, gradient descent algorithm can be used find the global minimum.
disadvantages
    - it is not robust to outliers

MSE 
advantages 
    - it is differentiable, gradient descent algorithm can be used find the global minimum.
    - it has one local minima and one global minima
disadvantages
    - it is not robust to outliers
    - it has a different unit of measurements to the target variable. 
    
MAE
advantages
    - it is roubut to outliers than MSE
    - it has the same unit of measurements as the target variable
disadvantages
    - It is not differentiable at 0, making the optimization complex.
    - It may not be appropriate for skewed data.

## Q6. Explain the concept of Lasso regularization. How does it differ from Ridge regularization, and when is it more appropriate to use?

Lasso regularization is a type of shrinkage regularization that shrinks the coefficients of the independent variables towards zero. This is done by adding a penalty term to the loss function that is proportional to the sum of the absolute values of the coefficients. As a result, some of the coefficients may be shrunk to zero, which effectively removes those variables from the model.

Ridge regularization, L2 regularization, is another type of shrinkage regularization. However, it differs from Lasso regularization in that it shrinks the coefficients of the independent variables towards zero in a quadratic fashion. This means that the coefficients are less likely to be shrunk to zero than with Lasso regularization.

Lasso regularization is more appropriate to do feature selection. This is because it can automatically identify the most important features and remove the less important features from the model. Ridge regularization is more appropriate to use to prevent overfitting, but it does not to do feature selection.

## Q7. How do regularized linear models help to prevent overfitting in machine learning? Provide an example to illustrate.

Regularized linear models help to prevent overfitting in machine learning by penalizing the model for having too many parameters. This encourages the model to learn a simpler model that is less likely to overfit the training data.

For example, a linear regression model that is trained on a dataset of house prices. The model has 100 features, each of which is a measure of the house, such as the number of bedrooms, the square footage, and the location. Without regularization, the model will likely learn a complex model that fits the training data very well. However, this model is also likely to overfit the training data, meaning that it will not generalize well to new data. With regularization, the model will be penalized for having too many parameters. This will encourage the model to learn a simpler model that is less likely to overfit the training data.

## Q8. Discuss the limitations of regularized linear models and explain why they may not always be the bestchoice for regression analysis.

- **They can be computationally expensive to train.** This is because the regularization term adds an additional term to the loss function, which can make the optimization problem more difficult to solve.
- **They can be sensitive to the choice of regularization strength.** If the regularization strength is too high, the model will be too simple and will not be able to learn the full complexity of the data. If the regularization strength is too low, the model will be too complex and will overfit the training data.
- **They can be biased.** This is because the regularization term penalizes the model for having too many parameters, which can prevent it from learning the full complexity of the data.
- **They can be difficult to interpret.** This is because the regularization term can shrink the coefficients of the independent variables towards zero, which can make it difficult to understand the relationships between the independent variables and the dependent variable.

## Q9. You are comparing the performance of two regression models using different evaluation metrics. Model A has an RMSE of 10, while Model B has an MAE of 8. Which model would you choose as the better performer, and why? Are there any limitations to your choice of metric?

I would choose Model B as the better performer. This is because the MAE is less sensitive to outliers than the RMSE. They can skew the results of the RMSE, making it seem like the model is not as accurate as it actually is. The MAE is not affected by outliers as much, so it is a more reliable measure of the accuracy of the model.

## Q10. You are comparing the performance of two regularized linear models using different types of regularization. Model A uses Ridge regularization with a regularization parameter of 0.1, while Model B uses Lasso regularization with a regularization parameter of 0.5. Which model would you choose as the better performer, and why? Are there any trade-offs or limitations to your choice of regularization method?

Ridge regularization penalizes the sum of the squares of the coefficients, while Lasso regularization penalizes the sum of the absolute values of the coefficients. This means that Ridge regularization is more likely to shrink the coefficients of all features towards zero, while Lasso regularization is more likely to shrink some coefficients to zero and leave others unchanged. In this case, the regularization parameter for Lasso regularization is 0.5, which is a relatively large value. This means that it is more likely that Lasso regularization will shrink some coefficients to zero.

Therefore, I would choose Model B as the better performer.
