![b.jpg](attachment:b.jpg)

**About Dataset**
- This dataset contains information about various vehicles' carbon dioxide (CO2) emissions and fuel consumption. 
- In the context of Machine Learning (ML), this dataset is often used to predict CO2 emissions based on vehicle characteristics or to analyze fuel efficiency of vehicles. 
- The goal could be to predict CO2 emissions or fuel consumption based on the features of the vehicles.
- There are total 7385 rows and 12 columns. 

**The columns in the dataset can be described as follows:**

1. **Make**: The brand of the vehicle.
2. **Model**: The model of the vehicle.
3. **Vehicle Class**: The class of the vehicle (e.g., compact, SUV).
4. **Engine Size(L)**: The engine size in liters.
5. **Cylinders**: The number of cylinders in the engine.
6. **Transmission**: The type of transmission (e.g., automatic, manual).
7. **Fuel Type**: The type of fuel used (e.g., gasoline, diesel).
8. **Fuel Consumption City (L/100 km)**: Fuel consumption in the city (liters per 100 kilometers).
9. **Fuel Consumption Hwy (L/100 km)**: Highway (out-of-city) fuel consumption.
10. **Fuel Consumption Comb (L/100 km)**: Combined (city and highway) fuel consumption.
11. **Fuel Consumption Comb (mpg)**: Combined fuel consumption in miles per gallon.
12. **CO2 Emissions(g/km)**: CO2 emissions in grams per kilometer.

**Model**

**The "Model" column includes terms that identify specific features or configurations of vehicles:**
- `4WD/4X4`: Four-wheel drive. A drive system where all four wheels receive power.
- `AWD`: All-wheel drive. Similar to 4WD but often with more complex mechanisms for power distribution.
- `FFV`: Flexible-fuel vehicle. Vehicles that can use multiple types of fuel, such as both gasoline and ethanol blends.
- `SWB`: Short wheelbase.
- `LWB`: Long wheelbase.
- `EWB`: Extended wheelbase.

**Transmission**

**The "Transmission" column indicates the type of transmission system in the vehicle:**
- `A`: Automatic. A transmission type that operates without the need for the driver to manually change gears.
- `AM`: Automated manual. A version of a manual transmission that is automated.
- `AS`: Automatic with select shift. An automatic transmission that allows for manual intervention.
- `AV`: Continuously variable. A transmission that uses continuously varying ratios instead of fixed gear ratios.
- `M`: Manual. A transmission type that requires the driver to manually change gears.
- `3 - 10`: Number of gears in the transmission.

**Fuel Type**

**The "Fuel Type" column specifies the type of fuel used by the vehicle:**
- `X`: Regular gasoline.
- `Z`: Premium gasoline.
- `D`: Diesel.
- `E`: Ethanol (E85).
- `N`: Natural gas.

**Vehicle Class**

**The "Vehicle Class" column categorizes vehicles by size and type:**
- `COMPACT`: Smaller-sized vehicles.
- `SUV - SMALL`: Smaller-sized sports utility vehicles.
- `MID-SIZE`: Medium-sized vehicles.
- `TWO-SEATER`: Vehicles with two seats.
- `MINICOMPACT`: Very small-sized vehicles.
- `SUBCOMPACT`: Smaller than compact-sized vehicles.
- `FULL-SIZE`: Larger-sized vehicles.
- `STATION WAGON - SMALL`: Smaller-sized station wagons.
- `SUV - STANDARD`: Standard-sized sports utility vehicles.
- `VAN - CARGO`: Vans designed for cargo.
- `VAN - PASSENGER`: Vans designed for passenger transportation.
- `PICKUP TRUCK - STANDARD`: Standard-sized pickup trucks.
- `MINIVAN`: Smaller-sized vans.
- `SPECIAL PURPOSE VEHICLE`: Vehicles designed for special purposes.
- `STATION WAGON - MID-SIZE`: Mid-sized station wagons.
- `PICKUP TRUCK - SMALL`: Smaller-sized pickup trucks.

This dataset can be used to understand the fuel efficiency and environmental impact of vehicles. Machine learning models can use these features to predict CO2 emissions or perform analyses comparing the fuel consumption of different vehicles.

# <font color='green'> <b>EDA and Data Cleaning</b><font color='black'>

## <font color='blue'> <b>Import Library</b><font color='black'>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
 
%matplotlib inline  

import warnings
warnings.filterwarnings('ignore')

**%matplotlib inline:** Allows graphics created with Matplotlib to be displayed directly in the notebook

**import plotly.express as px:** Loads the module required to use the interactive Plotly Express charts.

## <font color='blue'> <b>Read Dataset</b><font color='black'>

In [None]:
co2 = pd.read_csv('co2.csv')
df = co2.copy()

# <font color='green'> <b>Simple Linear Regression Model</b><font color='black'>

**Linear Regression**

It predicts the dependent variable based on the independent variables.

It is called simple linear regression if there is only one independent variable and multiple linear regression if there is more than one independent variable.

If there is a significant relationship between feature and target in simple linear regression, this data is suitable for linear regression


**Basic assumptions of linear regression analysis**

1. **Linear Relationship Assumption:** It is assumed that the relationship between the dependent variable and the independent variables can be expressed linearly. That is, the relationship between the regression line and the variables should be linear. For example, as X increases, Y should increase or decrease.

2. **Independence Assumption:** Independence between observations is assumed. That is, the result of one observation should not affect other observations.

3. **Normal Distribution Assumption:** The error terms (residuals) should be normally distributed and should not form any pattern. This is necessary for the estimates of the regression model to be reliable.

4. **Independence of Independent Variables:** There should be no multicollinearity problem among the independent variables. That is, independent variables should not be too close to each other and highly correlated.

- Y' = b0 + b1X

- Y'chapka'= predicted value

- b0= intercept (point where the line crosses the y-axis)

- b1= slope = coefficient = weight

- X=independent variable

- Residual = Random error = e= Y-Slap

**The important thing is to minimize the error.

**Best fit line**; it draws in a way to minimize our errors. When there is a single feature, it is found with **Ordinary Least Squares** method, when there is more than one feature, it is found with **gradient descent**.

**Cost - loss function**; squares the difference between the actual and predicted values and finds the average error.

With the **gradient descent** optimization algorithm, what you will do is to gradually change the value of w and b to reduce the cost - loss function and try to bring it to its minimum value.

It is an iterative algorithm that tries to minimize the error.

**Bias** is the systematic error in a model's predictions. the further away a model's predictions are from the true values, the higher the bias. High bias can occur because the model is simple and does not fully capture the complexity of the data. Underfitting can therefore

**Variance** is the variability of the model's predictions over different data points. 
Different results are obtained when trained multiple times on the same dataset.
It occurs as a result of overfitting.
A high variance indicates that the model overfits the data and the patterns it learns cannot be generalized to new data outside the dataset.

**bias-variance trade-off**:
It is important to strike a balance between bias and variance. Ideally, a model should have both low bias and low variance. This means that the model can accurately capture the complexity of the data and generalize.

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

**Rsquare, R2 (The Coefficient of Determination)** How much of the change in the target with the current featur
How much information do I have to explain the target or how much information do I have to predict the target correctly?

It takes values between 0-1. The closer it is to 1, the higher the success rate.

A negative R2 score indicates that the model is misleading the data rather than explaining the data and is an indication that the model is performing poorly. In this case, the data may need to be re-modeled with a different model or using a different set of data characteristics.

**R-square** = 1 - (SSR/SST)

**SSR (Residual Sum of Squares) - Sum of Squares of Residuals**: 
SSR is the sum of squares of the differences between actual values and predicted values. y - sum of squares of ychapka

This expression represents the sum of the squares of the difference between the true value and the model's prediction for each observation.

**SST (Total Sum of Squares) - Total Sum of Squares**:
SST is the sum of the squares of the differences between the mean of the true values and the true values. y - ymean(mean of true values)

This shows the overall variability of the data. 

- The SSR expresses how "wrong" your model is. A low SSR value means that your model makes predictions close to the true values.

- SST shows the total variability contained in your data. This can be thought of as the "maximum variability" that your model should predict.

## <font color='blue'> <b>Splitting the dataset into X (independent variables) and y (dependent variable)</b><font color='black'>

## <font color='blue'> <b>Train | Test Split</b><font color='black'>

## <font color='blue'> <b>Training the Model</b><font color='black'>

**Meaning of Numbers**

1. **Positive Coefficient**:
   - If the coefficient of an independent variable is positive, this means that when the value of that variable increases, the target variable (the predicted variable) will also increase.
   - For example, there may be a positive relationship between the size of the house and the price of the house: As the size of the house increases, its price increases.

2. **Negative Coefficient**:
   - If the coefficient of an independent variable is negative, this indicates that when the value of that variable increases, the target variable will decrease.
   - For example, there may be a negative relationship between the age of a vehicle and its value: The older the vehicle, the lower its value.

**Size of Coefficients**

- **High Absolute Value**: The greater the absolute value of a coefficient (regardless of whether it is positive or negative), the greater the effect of the relevant independent variable on the target variable.
- **Low or Near Zero Value**: If a coefficient is low or close to zero, this indicates that the effect of the relevant independent variable on the target variable is weak or insignificant.

In summary, the coefficients in a linear regression model indicate the effect of the independent variables on the target variable. The coefficients express the direction (positive or negative) and magnitude (strength of the effect) of this effect. This information is used to understand how the model works and which variables play an important role in the predictions.

**Intercept**: In a linear regression equation, the estimated value of the target variable when the value of the independent variables is zero. In other words, it is the point where the regression line crosses the y-axis.

Mathematically, a simple linear regression equation looks like this:

\[ y = mx + b \]

Where:
- \( y \) represents the target variable.
- \( x \) is the independent variable.
- \( m \) is the slope or coefficient (representing the change in the target variable with each unit change in the independent variable).
- \( b \) is the intercept (i.e. the value of \( y \) when \( x = 0 \)).

**The Importance of the Intercept**

- **Model Interpretation**: The intercept is the value of the target variable that the model predicts when all independent variables are zero. This is especially important when the zero point of the variables has a practical meaning.

- **Starting Point**: The intercept determines the starting point of the regression line, which indicates the general trend of the model on the target variable.

The intercept shows the general trend of the model and the effect of the variables on the target variable at the zero point. Therefore, it is a critical component for interpreting and understanding the model.

## <font color='blue'> <b>Predicting Test Data</b><font color='black'>

## <font color='blue'> <b>Evaluating the Model</b><font color='black'>

**What is RMSE (Root Mean Square Error)?

- **Calculation of RMSE**: RMSE is the square root of the mean squared error (MSE), which is the sum of the square of the differences (errors or residuals) between the model predicted values and the actual values (targets) divided by the number of observations.

- **Meaning and Interpretation**: RMSE gives more weight to large errors by squaring the errors, thus "penalizing" large errors made by the model. The value obtained by taking the square root expresses the errors in the same units as the values the model is trying to predict. This makes RMSE easier to understand and interpret.

**Why RMSE is Preferred

- **Emphasis on the Magnitude of Errors**: RMSE considers large errors to be more important than small errors. This better reflects the performance of the model in real-world scenarios because large errors can often lead to more critical outcomes.

- **Error Metric in the Same Unit**: RMSE presents the error metric in the same unit as the unit of the predicted values. This allows to evaluate the performance of the model in a more intuitive way.

**Extra Explanation: Ratio of Errors to Mean Values**

- **The Importance of Mean Values**: By comparing the RMSE to the average of the values the model is trying to predict, it is possible to understand how good or bad the overall performance of the model is relative to the overall magnitude of the predicted values. This ratio indicates how significant the model's errors are relative to the overall magnitude.

**In summary, RMSE is an error metric used to measure the predictive performance of a model, giving more weight to large errors and expressing errors in the same units as the values predicted by the model. Because of these properties, it is often preferred especially in regression problems. The ratio of RMSE to the mean of the value we want to predict is an indicator of the overall performance of the model.

## <font color='blue'> <b>Residuals</b><font color='black'>

## <font color='blue'> <b>Prediction Error for LinearRegression</b><font color='black'>

# <font color='green'> <b>Multiple Linear Regression Model</b><font color='black'>

## <font color='blue'> <b>Splitting the dataset into X(independent variables) and y (dependent variable)</b><font color='black'>

## <font color='blue'> <b>Multicollinearity</b><font color='black'>

**Multicollinearity is a problem in gradient descent-based models such as linear and logistic regression and can be addressed in various ways:

**What is Multicollinearity?

- **Definition**: Multicollinearity is a high level of correlation between independent variables. That is, it occurs when one or more independent variables are strongly correlated with other independent variables.

- **Effect**: This affects the predictive power and interpretability of the model. The coefficients of the model become difficult to estimate and this may lead to misinterpretation of the effect of some independent variables.

**Multicollinearity Issues**

- **Interpretation Difficulty**: High correlation between independent variables makes it difficult to understand which variable is really influential in the model.

- **Reliability of Coefficients**: Coefficients may be unreliable due to variables that are highly correlated with each other.

**Multicollinearity Removal**

- **Regularization Algorithms**: Regularization techniques such as Lasso and Ridge help to alleviate the Multicollinearity problem. These techniques can solve the problem of overfitting the model while at the same time reducing the problem of high correlation between independent variables.

- **Data Preprocessing**: In some cases, removing or combining highly correlated variables from the data set can reduce Multicollinearity.

In summary, Multicollinearity occurs when there is high correlation between independent variables and affects the accuracy and interpretability of the model. Regularization techniques and appropriate data preprocessing methods can help alleviate this problem.

## <font color='blue'> <b>Train | Test Split</b><font color='black'>

## <font color='blue'> <b>Training the Model</b><font color='black'>

## <font color='blue'> <b>Predicting Test Data</b><font color='black'>

## <font color='blue'> <b>Evaluating the Model</b><font color='black'>

## <font color='blue'> <b>Cross Validate</b><font color='black'>

**Overfitting Check and Cross Validation**

1. **Overfitting Control**: 
   - Overfitting is when a machine learning model overfits the training data and performs poorly on new, unseen data.
   - Overfitting is checked by comparing the scores on the training set and the validation set. If the scores on the training set are much higher than the validation set, this can be a sign of overfitting.

2. **Model Reset before each Cross Validation**:
   - The model should be reset before each cross validation iteration. Otherwise, information from previous iterations may leak into the new iteration (data leakage) and lead to misleading results.

3. **Using **return_train_score=True**:
   - In cross validation, the return_train_score=True option also returns the scores of the training set for each iteration. This is useful to better understand overfitting by comparing training and validation scores.
   
**Negative Scoring Metrics**

- **Maximized Scores**: Scikit-learn algorithms are designed to maximize scores. However, metrics such as MAE (Mean Absolute Error), MSE (Mean Squared Error), RMSE (Root Mean Squared Error) are actually metrics that should be minimized.

- **Negative Scoring**: Scikit-learn adapts to the algorithm's tendency to maximize by shifting these minimization metrics to the negative axis. This should be taken into account when evaluating metrics such as MAE, MSE and RMSE.

 
In summary, the use of cross validation and scoring metrics is critical for assessing the generalization ability of the model and detecting overfitting. These techniques are used to more accurately understand the performance of the model on real-world data.

## <font color='blue'> <b>Residuals</b><font color='black'>

## <font color='blue'> <b>Prediction Error for LinearRegression</b><font color='black'>

# <font color='green'> <b>Final Model and Prediction</b><font color='black'>

# <font color='green'> <b>Regularization</b><font color='black'>

**"Regularization, "** is a technique that helps reduce or prevent a machine learning model from overfitting. An overfitting model is a model that is very well fitted to training data, but has a poor ability to generalize to new and unseen data.

Regularization is used to control the complexity of the model and prevent overfitting. It is usually applied by changing the resource function or the cost function of the model. There are common techniques such as **L1 and L2 regularization**.

L1 regularization makes the coefficients of the model close to zero, making some features irrelevant and making the model less complex. L2 regularization reduces overfitting by reducing the magnitude of the coefficients.

As a result, when applied correctly, regularization can reduce overfitting and allow the model to generalize better. However, which regularization techniques to use and how strongly to apply them depends on the characteristics of your dataset and your model.

## <font color='blue'> <b>Polynomial Conversion</b><font color='black'>

**Polynomial regression - transformation** is a modeling technique of very limited use. It is particularly useful for modeling complex and non-linear relationships between data. For example, it is successfully applied with real data in many fields such as physics, economics and engineering. However, polynomial regression can often also be used in academic research and hypothesis testing. 
 
Polynomial regression adds extra features to the data by creating new features using existing features. While this helps the model to fit the data better, it also increases the complexity of the model.

Polynomial regression can be especially useful when the number of features is small, i.e. when there are a limited number of features in the data. Therefore, it offers a flexible modeling option depending on the number of features in your dataset.

The polynomial degree determines how well the model can capture non-linear relationships. As the degree increases, the model can model more complex relationships, but this can also increase the risk of overfitting. Careful selection of the polynomial degree is therefore important for building a successful model.

## <font color='blue'> <b>Train | Test Split</b><font color='black'>

## <font color='blue'> <b>Scaling the Data</b><font color='black'>

**Why Are We Scaling?

1. **Standardization**: It allows algorithms to work more effectively by bringing features at different scales to a similar range.

2. **Distance Based Algorithms**: Distance-based algorithms such as KNN or K-means can be affected by unscaled data when calculating the distance between features. Scaling helps such algorithms to provide fairer and more consistent results.

3. **Coefficient Comparison**: In models such as linear regression, scaling is important for comparing the coefficients of features.

**Most Commonly Used Scaling Algorithms**

1. **StandardScaler (Z-Score Normalization)**: Scales each feature with mean 0 and standard deviation 1.

2. **MinMaxScaler (Normalization)**: Scales all features within a certain range (usually between 0 and 1).

3. **RobustScaler**: A scaling method that is less affected by outliers. It uses the median and the range between quartiles.

**Which Scaling Algorithm Should I Use?

- **Standard Practice**: Usually StandardScaler and MinMaxScaler are tried first. Whichever algorithm gives better results is preferred.

- **Outlier Status**: If there are outliers in the data set, RobustScaler may be preferred. However, it is best to continue with the algorithm that gives the best result by trying all scaling algorithms.

In summary, scaling allows machine learning algorithms to work more effectively and accurately by bringing different features into a similar range. Which scaling method to use may vary depending on the characteristics of the dataset and the type of model. Therefore, it is important to experiment with different scaling methods to determine which one provides the best performance.

**Scaling Process**

1. **fit() Method**:
   - scaler.fit(X_train): This step means that the selected scaling algorithm (e.g. StandardScaler, MinMaxScaler, RobustScaler, etc.) analyzes the distribution of features in the training dataset and calculates the necessary parameters for scaling (e.g. mean, standard deviation).
   - This process enables the generation of scaling formulas and is done only on the training set.

2. **transform() Method**:
   - scaler.transform(): This step scales both the training set and the test set, using the parameters previously calculated with the fit method.
   - Scaling is the process of transforming the data with the specified formula.

**Applying Scaling to Training and Test Set**

- **Apply fit() only on the training set**: Scaling parameters should only be calculated from the training set. This is because when testing the model on real world data, the test data should not have been "seen" by the model before.

- **Test Set Transformation**: The test set is transformed using the scaling parameters obtained from the training set. This ensures that the training and test set are scaled in the same way, thus allowing an accurate assessment of the model's performance.

**Summary**

The scaling process is important for better performance and fair evaluation of the model. Calculating scaling parameters only from the training set and applying these parameters to both the training and test set increases the generalizability and accuracy of the model. Incorrect scaling application may cause the model to give misleading results.

# <font color='green'> <b>Ridge Regression</b><font color='black'>

**Why and When I Use Ridge-Lasso**
                                          
Ridge and Lasso regressions are regularization versions of linear regression. This regularization makes the model more robust to overfitting and can sometimes improve the overall performance of the model. 

1. **Preventing Overfitting**:
   - **Ridge (L2 Regularization)**: Adds the sum of the squares of the coefficients as a penalty to minimize the coefficients of the model. This prevents overfitting by reducing the complexity of the model.
   - **Lasso (L1 Regularization)**: Adds the sum of the absolute values of the coefficients as a penalty to compress the coefficients of the model towards zero. This performs feature selection by bringing some coefficients to zero completely, reducing the complexity of the model.

2. **Feature Selection**:
   - Lasso regression can remove ineffective features from the model by reducing some coefficients to exactly zero. This is also called feature selection and is very useful in high dimensional data sets.

3. **Multicollinarity**:
   - If features in a data set are highly correlated (multicollinarity), this can destabilize the coefficients of a linear regression. Ridge and Lasso can stabilize the coefficients in such cases.

4. **Model Interpretability**:
   - Lasso can improve model interpretability by including only the most important features in the model. Ridge also makes the effect of features in the model more understandable by minimizing the coefficients.

In conclusion, Ridge and Lasso regressions can help us overcome some of the limitations of linear regression and build more generalizable, more stable models. Which method to use depends on the data set, the problem definition and especially how flexible you want the model to be. To find a balance between these two Regularization techniques, there are also approaches such as **Elastic Net**.

**L2 Regularization (Ridge Regularization):**

- L2 regularization aims to keep the model weights low, thus limiting the power of the model. 

- It shrinks the coefficients towards zero, but does not set any coefficient exactly to zero. It is used when we want to keep all the features.

- The sum of the residuals is equal to lambda (regularization parameter) x the sum of the squares of the coefficients.

## <font color='blue'> <b>CV with alpha : 1</b><font color='black'>

## <font color='blue'> <b>Choosing best alpha value with GridSearchCV for Ridge</b><font color='black'>

**Purpose of GridSearchCV**

- **Hyperparameter Optimization**: GridSearchCV is used to find the best hyperparameter combinations to maximize the performance of the machine learning model.

- **Comprehensive Search**: When there are multiple hyperparameters, GridSearchCV tries all possible combinations of hyperparameters and finds the combination that gives the best result.

**Hyperparameter vs. Parameter**

- **Hyperparameter**: Parameters that are set before the model is trained and control the learning process and structure of the model. For example, the tree depth in a tree-based model or the regularization value in a linear model.

- **Parameter**: Values that the model learns during the training process and extracts from the data set. For example, the coefficients and intercept in a linear regression model.

How the **GridSearchCV Works**

1. **Parameter Grid Definition**: A list of various hyperparameters specified by the user and a list of values to be tested for these parameters (parameter grid) is created.

2. **Search and Evaluation**: GridSearchCV trains the model by trying each combination in this parameter grid and evaluates the performance of the model using cross validation for each combination.

3. **Finding the Best Combination**: Among all combinations, the hyperparameter combination that maximizes the performance of the model is selected.

4. **Training the Final Model**: The final model is trained with the best selected hyperparameters. Therefore, as a result of GridSearchCV, the model is trained with these best hyperparameters.

**Summary**

GridSearchCV is a hyperparameter optimization tool used to maximize the performance of a machine learning model. It systematically tries all possible combinations of hyperparameters to determine the combination that works best and trains the final model with that combination. This process is critical to ensure that the model makes more efficient and accurate predictions.

# <font color='green'> <b>Lasso Regression</b><font color='black'>

**L1 Regularization (Lasso Regularization):** L1 regularization ensures that some of the model weights are zero. 

- This method can also be used for feature selection. 

- It shrinks the coefficients towards zero and sets some coefficients exactly to zero. In this way it reduces unimportant features and makes the model simpler.

- To the sum of the residuals is added the error equal to lambda (regularization parameter) (expressed as alpha in the model) x the sum of the absolute values of the coefficients. 

## <font color='blue'> <b>Choosing best alpha value with GridSearchCV for Lasso</b><font color='black'>

# <font color='green'> <b>Final Model and Prediction</b><font color='black'>

# <font color='green'> <b>Feature importances with Ridge</b><font color='black'>

**Yellowbrick Library and FeatureImportances

1. **FeatureImportances Function**:
   - Yellowbrick is a Python library developed to facilitate the interpretation and visualization of machine learning models.
   - The FeatureImportances function is used to visualize the importance of a model's features. This is useful for understanding which features contribute the most to model predictions.

2. **Determining Model and Features**:
   - In this example, we use the Ridge regression model and the alpha=0.01 hyperparameter that works best for this model.
   - When visualizing, the column names (df.columns) of the dataset used to train the model are given as labels parameter.

3. **Visualization and Interpretation**:
   - After the model is trained on the training data set, visualization is done with the viz.show() command.
   - The Ridge model shows the importance of features using their weights (coefficients). However, the Ridge model does not completely reduce the features to zero, so it is used to understand the relative importance of the features on the model, rather than making a precise "feature selection".

**RadViz Visualization**

- **Visualizer Size Adjustment**: Visualizers like RadViz(size=(720, 3000)) are used to set the size of the visualizer. Here numbers like 720 and 3000 can be set to change the size of the visualizer.

**Summary**

Yellowbrick's FeatureImportances function is used to visualize which features the model gives more weight to, especially in regression models such as Ridge. This is important to understand which features are more decisive in the model's prediction process. However, with the Ridge model, it is more appropriate to assess the relative importance of features rather than making a definitive feature selection. RadViz and similar visualizers can be used to adjust the size of the visualizations.

# <font color='green'> <b>Feature importances with Lasso</b><font color='black'>

![a.jpg](attachment:a.jpg)