<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0X57EN/SN_web_lightmode.png?1679073235843" width="300" alt="cognitiveclass.ai logo">
</center>

# Investigation of BTC/BUSD cryptocurrency using ADOSC, NATR, TRANGE indicators, and other cryptocurrencies.


# Lab5 (Model Evaluation and Refinement)

Estimated time needed: **30** minutes

## Objectives

After completing this lab you will be able to:

*   Evaluate and refine prediction models

<h3>Table of Contents</h3>

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li><u>Training and Testing</u></li>
    <li><u>Cross-Validation</u></li>
    <li><u>Overfitting, Underfitting and Model Selection</u></li>
    <li><u>Ridge Regression</u></li>
    <li><u>Grid Search</u></li>
</ol>

</div>

<hr>

#### ***Dataset description***

The dataset used in this lab contains time-series data on various attributes related to Bitcoin (BTC) and other cryptocurrencies, aggregated at 1-minute intervals. The dataset index represents the time period for which the data is reported(1 minute). Also the dataset contains binned average prices of other cryptocurrencies. 

<hr>

**Attributes:**

* ***General:***
    * `open` - the opening price of a **BTC** during a specific time period.
    * `high` - the highest price of a **BTC** during a specific time period.
    * `low` - the lowest price of a **BTC** during a specific time period.
    * `close` - the closing price of a **BTC** during a specific time period.
    * `rec_count` - the number of records or data points in the dataset for a given time period.
    * `volume` - the total amount of trading activity (buying and selling) for a **BTC** during a specific time period.
    * `avg_price` - the average price of a **BTC** during a specific time period.


* ***Indicators***
    * `ADOSC` - an indicator used in technical analysis to measure the momentum of buying and selling pressure for ***Bitcoin***.
    * `NATR` - an indicator used in technical analysis to measure the volatility of ***Bitcoin***.
    * `TRANGE` - an indicator used in technical analysis to measure the range of prices (from high to low) for ***Bitcoin*** during a specific time period.


* ***Other cryptocurrencies:***
    * `ape_avg_price` - the average price of ***APE*** during a specific time period.
    * `bnb_avg_price` - the average price of ***BNB*** during a specific time period.
    * `doge_avg_price` - the average price of ***DOGE coin*** during a specific time period.
    * `eth_avg_price` - the average price of ***Ethereum*** during a specific time period.
    * `xrp_avg_price` - the average price of ***XRP*** during a specific time period.
    * `matic_avg_price` - the average price of ***MATIC*** during a specific time period.
* ***Categorical:***
    * `category` - binned ***average price for BTC(`avg_price`)*** to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `ape_category` - binned `ape_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `bnb_category` - binned `bnb_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `doge_category` - binned `doge_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `eth_category` - binned `eth_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `xrp_category` - binned `xrp_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `matic_category` - binned `matic_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    
<hr>

*The indicators `ADOSC`, `NATR`, and `TRANGE` are used in technical analysis to provide insights into the momentum, volatility, and price ranges of financial instruments or assets. The other attributes represent the average prices of different cryptocurrencies during a specific time period.*

<hr>

In previous lab we trained ML models and retrieved some metrics for our models on test data. In this one, we will explain this metrics more in depth.

#### Setup


In [ ]:
!mamba install pandas -y
!mamba install numpy -y
!mamba install matplotlib -y
!mamba install scipy -y
!mamba install seaborn -y 
!mamba install statsmodels -y
!mamba install scikit-learn -y 
!mamba install tqdm -y

Import libraries

In [ ]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from tqdm import tqdm
from ipywidgets import interact, interactive, fixed, interact_manual
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import TimeSeriesSplit
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV

%matplotlib inline

Intialize path to our dataset(the one we saved in previous lab). 

In [ ]:
path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0X57EN/BTCBUSD_1min_categories.csv'

Read dataset from path.

In [ ]:
df = pd.read_csv(path)

# converting index to datetime
df.index = pd.to_datetime(df.index)
df.head()

Now, let's retrieve columns and shape of our dataFrame

In [ ]:
print(df.columns, df.shape)

##### **Functions for Plotting**

In [ ]:
def DistributionPlot(RedFunction, BlueFunction, RedName, BlueName, Title, figsize=(12, 10)):
    width, height = figsize
    plt.figure(figsize=(width, height))
    
    ax1 = sns.kdeplot(RedFunction, color="r", label=RedName)
    ax2 = sns.kdeplot(BlueFunction, color="b", label=BlueName, ax=ax1)

    plt.title(Title)
    plt.xlabel('Price (in dollars)')
    plt.ylabel('Proportion')
    plt.legend()

    plt.show()
    plt.close()

## Training and Testing

<p>Just like in previous lab let's split our data into train and test samples:</p>


In [ ]:
# y is a target value column - the value we want to predict
y = df['avg_price']
# deleting target column from X dataset
X = df.drop('avg_price', axis=1)

# setting test_size to 0.2 so our test set will be of size 20% 
# using shuffle=False to split data into 2 windows(80% of train data that happened before other 20% of train)
# or in other words we predict 20 percent of prices that will happen in future
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

Let's look at the shape of training and test data

In [ ]:
print("Number of training samples:",X_train.shape[0])
print("Number of test samples :", X_test.shape[0])

The `test_size` parameter sets the proportion of data that is split into the testing set. In the above, the testing set is 20% of the total dataset.


Just like in previous lab, let's train Linear regression model

In [ ]:
lr = LinearRegression()
lr.fit(X_train[['doge_avg_price']], y_train)

Let's calculate $R^2$ on train and test data(in previous lab we calculated only test metrics, but it is a good practice to calculate both, in order to avoid underfitting and overfitting - you will learn more about these terms later in this lab)

In [ ]:
print(f"Test r2 score: {lr.score(X_test[['doge_avg_price']], y_test):.4f}")
print(f"Train r2 score: {lr.score(X_train[['doge_avg_price']], y_train):.4f}")

We can see that test and train $R^2$ score are relatively high(as for Linear regression model) and have almost the same $R^2$ score.

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;"> Question  #1: </b>

  <b>Use the function "train_test_split" to split up the dataset such that 10% of the data samples will be utilized for testing. Set the parameter `shuffle` equal to `False`. The output of the function should be the following:  `X_train1` , `X_test1`, `y_train1` and  `y_test1`.</b>
    
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 

<details><summary>Click here for the solution</summary>

```python
X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y, shuffle=False, test_size=0.1) 
print("Number of training samples:",X_train1.shape[0])
print("Number of test samples :", X_test1.shape[0])
```

</details>


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;"> Question  #2: </b>

  <b>Using data you received in task № 1 train linear regression model(Use <code>doge_avg_price</code> as a predictor) and call it <code>lr1</code>..</b>
    
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute

<details><summary>Click here for the solution</summary>

```python

lr1=LinearRegression()
lr1.fit(X_train1[['doge_avg_price']], y_train1)
```

</details>

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;"> Question  #3: </b>

  <b>Find the $R^2$  on the test data using 10% of the dataset on test and train data; Make conclustions.</b>
    
</div>



In [ ]:
# Write your code below and press Shift+Enter to execute

<details><summary>Click here for the solution</summary>

```python
print(f"Test r2 score: {lr1.score(X_test1[['doge_avg_price']], y_test1)}")
print(f"Train r2 score: {lr1.score(X_train1[['doge_avg_price']], y_train1)}")
```
We can see that test score is negative(it means that model performance is really poor) at the same time training data has r2 of 0.75, so it is clear sign of overfitting(model performs well on train, but not on test | it does not generalize well).
</details>


## Cross-Validation


When working with time series data, like BTC and other cryptocurrencies, **it's essential to evaluate the performance of your model using cross-validation** in addition to evaluating it on a single test dataset.

**Cross-validation is important because it helps to estimate how well your model can generalize to new, unseen data.** 

**Time series data** is **sequential**, and **future observations are often dependent on past observations**, so **the model needs to be able to capture this temporal relationship to predict future values accurately.**

In case of time series data, two common techniques for cross-validation are **forward chaining cross-validation** and **rolling window cross-validation**. **Forward chaining cross-validation involves training the model on earlier data and testing it on subsequent data**, while **rolling window cross-validation involves training the model on a fixed window of data and testing it on a subsequent window**.

We will use forward chaining cross-validation(explanation of why we use forward chaining cross-validation will be below):

In [ ]:
# Set the size of the validation set to be used for each iteration
validation_size = 60 # we use 1 hour

# Convert DataFrame to NumPy array
X_array = X[['doge_avg_price']].to_numpy()
y_array = y.to_numpy()

# Create an empty list to store the mean squared error for each fold
mse_scores = []
r2_scores = []

# Instantiate the model
model = LinearRegression()

# Loop over each data point in the sequence
for i in range(validation_size, X.shape[0]):
    # Split the data into training and validation sets
    X_train_i = X_array[i-validation_size:i, :]
    y_train_i = y_array[i-validation_size:i]
    X_valid_i = X_array[i:i+1, :]
    y_valid_i = y_array[i:i+1]
    
    # Train the model on the training data
    model.fit(X_train_i, y_train_i)
    
    # Use the model to predict the validation data
    y_pred_i = model.predict(X_valid_i)
    
    # Calculate the mean squared error for this fold
    mse = mean_squared_error(y_valid_i, y_pred_i)

    # Add the mean squared error to the list of mse scores
    mse_scores.append(mse)

# Calculate the overall mean squared error across all folds
mean_mse = np.mean(mse_scores)

# Print the mean squared error
print(F"Overall Mean squared error: {mean_mse:.4f}")

The **forward chaining cross-validation method is a way to simulate how well your model will perform in real-world situations where we are getting new data over time**, and **we want to train your model on that new data as it arrives**.

In this approach, **we train the model on a fixed size of historical data and evaluate it on the next data point in the sequence, which represents the new data that just arrived**. By doing this for every data point in the sequence, so we simulate the process of training and testing our model on new data as it arrives over time.

***In other words:***

The **goal of this approach is to evaluate how well your model can generalize to new data over time**, which is important in real-world applications where your data is constantly changing.

**Why we did not use `sklearn.model_selection.cross_val_score`?**

Because, `sklearn.model_selection.cross_val_score` does not perform forward chaining cross-validation by default. Instead it implements k-fold cross-validation, which shuffles the data and divides it into `k` equally sized folds. Each fold is used as a test set once while the remaining folds are used as the training set. The cross-validation process is repeated `k` times and the average score is calculated across all folds.

Let's plot Mean squared error that we calculated over each fold:

In [ ]:
def plot_mse_scores(mse_scores):
    """
    Plots a line graph of the mean squared error (MSE) scores.
    
    Args:
        mse_scores (list): A list of mean squared error scores.
    """
    # Create a line plot of the mean squared error scores
    plt.plot(mse_scores)
    
    # Set the x-axis label
    plt.xlabel('Folds')
    
    # Set the y-axis label
    plt.ylabel('Mean Squared Error')
    
    # Set the title of the plot
    plt.title('Mean Squared Error Scores')
    
    # Show the plot
    plt.show()

In [ ]:
plot_mse_scores(mse_scores)

From the plot we can see that our model would make mistakes frequently.

## Overfitting, Underfitting and Model Selection

In previous lab we usually calculated test metrics, but it is a good practice to calculate train metrics as well to get better insights. 

***Why?***

**By comparing test metrics to train metrics, you can determine whether the model is overfitting or underfitting. If the test metrics are significantly worse than the train metrics, it's a sign that the model may be overfitting to the training data.** In other words, it helps in identifying potential issues and make improvements to the model to improve its overall performance and generalization to new data

Let's create **Multiple Linear Regression objects and train the model just like in previous lab**(using same features as in previous lab).


In [ ]:
mlr = LinearRegression()
mlr.fit(X_train[['doge_avg_price', 'eth_avg_price', 'xrp_avg_price']], y_train)

Prediction using training data:


In [ ]:
yhat_train = mlr.predict(X_train[['doge_avg_price', 'eth_avg_price', 'xrp_avg_price']])
yhat_train[0:5]

Prediction using test data:


In [ ]:
yhat_test = mlr.predict(X_test[['doge_avg_price', 'eth_avg_price', 'xrp_avg_price']])
yhat_test[0:5]

Let's examine the distribution of the predicted values of the training data.


In [ ]:
Title = 'Distribution  Plot of  Predicted Value Using Training Data vs Training Data Distribution'
DistributionPlot(y_train, yhat_train, "Actual Values (Train)", "Predicted Values (Train)", Title)

**Figure 1:** Plot of predicted values using the training data compared to the actual values of the training data.


So far, the model seems to be doing well in learning from the training dataset. But what happens when the model encounters new data from the testing dataset? When the model generates new values from the test data, we see the distribution of the predicted values is much different from the actual target values.


In [ ]:
Title='Distribution  Plot of  Predicted Value Using Test Data vs Data Distribution of Test Data'
DistributionPlot(y_test,yhat_test,"Actual Values (Test)","Predicted Values (Test)",Title)

**Figure 2:** Plot of predicted value using the test data compared to the actual values of the test data.


<p>Comparing Figure 1 and Figure 2, it is evident that the distribution of the test data in Figure 1 is much better at fitting the data. This difference in Figure 2. Let's see if polynomial regression also exhibits a drop in the prediction accuracy when analysing the test dataset.</p>


***Overfitting***

<p>Overfitting occurs when the model fits the noise, but not the underlying process. Therefore, when testing your model using the test set, your model does not perform as well since it is modelling noise, not the underlying process that generated the relationship. Let's create a degree 5 polynomial model.</p>


Let's use 75 percent of the data for training and the rest for testing:


In [ ]:
Xp_train, Xp_test, yp_train, yp_test = train_test_split(X, y, test_size=0.25, shuffle=False)

We will perform a degree 5 polynomial transformation on the feature <b>'doge_avg_price'</b>.


In [ ]:
pr = PolynomialFeatures(degree=5)
x_train_pr = pr.fit_transform(Xp_train[['doge_avg_price']])
x_test_pr = pr.fit_transform(Xp_test[['doge_avg_price']])
pr

Now, let's create a Linear Regression model "poly" and train it.


In [ ]:
poly = LinearRegression()
poly.fit(x_train_pr, yp_train)

We can see the output of our model using the method "predict." We assign the values to "yhat".


In [ ]:
yhat = poly.predict(x_test_pr)
yhat[0:5]

Let's take the first five predicted values and compare it to the actual targets.


In [ ]:
print("Predicted values:", yhat[0:4])
print("True values:", yp_test[0:4].values)

$R^2$ of the training data:


In [ ]:
poly.score(x_train_pr, yp_train)

$R^2$ of the test data:


In [ ]:
poly.score(x_test_pr, yp_test)

We see the $R^2$ for the training data is 0.48 while the $R^2$ on the test data was 0.72.


Let's see how the R^2 changes on the test data for different order polynomials and then plot the results:


In [ ]:
Rsqu_test = []

order = [1, 2, 3, 4, 5, 6, 7, 8, 9]
for n in order:
    pr = PolynomialFeatures(degree=n)
    
    x_train_pr = pr.fit_transform(Xp_train[['doge_avg_price']])

    x_test_pr = pr.fit_transform(Xp_test[['doge_avg_price']])

    lr.fit(x_train_pr, yp_train)

    Rsqu_test.append(lr.score(x_test_pr, yp_test))

plt.plot(order, Rsqu_test)
plt.xlabel('order')
plt.ylabel('R^2')
plt.title('R^2 Using Test Data')
plt.text(3, 0.75, 'Maximum R^2 ')    

We see the $R^2$ gradually increases until an order 5 polynomial is used. Then, the $R^2$ decreases.


Let's try to visualize the polynomial regression of `degree=2`

In [ ]:
# Generate some sample data
x_ = np.linspace(0, 10, 100)
y_ = np.sin(x_) + np.random.normal(0, 0.1, size=len(x_))

# Split data into training and test sets
x_train_sample, x_test_sample, y_train_sample, y_test_sample = train_test_split(x_, y_, test_size=0.2)

# Create polynomial features for training and test sets
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(x_train_sample.reshape(-1, 1))
X_test_poly = poly.transform(x_test_sample.reshape(-1, 1))

# Fit linear regression model to training data
lin_reg = LinearRegression()
lin_reg.fit(X_train_poly, y_train_sample)

# Predict on test data and calculate R^2 score
y_pred_sample = lin_reg.predict(X_test_poly)
r2_score = lin_reg.score(X_test_poly, y_test_sample)

# Plot the test data and polynomial fit
plt.scatter(x_test_sample, y_test_sample)
plt.scatter(x_test_sample, y_pred_sample, color='red')
plt.title(f"Polynomial Regression (degree=2)\nTest R^2 score: {r2_score:.2f}")
plt.show()


The following function will be used later and will help you with understanding of polynomial regression.

In [ ]:
def f(order, test_data):
    x_train, x_test, y_train, y_test = train_test_split(x_, y_, test_size=0.2)
    # Create polynomial features for training and test sets
    poly = PolynomialFeatures(degree=order)
    X_train_poly = poly.fit_transform(x_train.reshape(-1, 1))
    X_test_poly = poly.transform(x_test.reshape(-1, 1))

    # Fit linear regression model to training data
    lin_reg = LinearRegression()
    lin_reg.fit(X_train_poly, y_train)

    # Predict on test data and calculate R^2 score
    y_pred = lin_reg.predict(X_test_poly)
    r2_score = lin_reg.score(X_test_poly, y_test)

    # Plot the test data and polynomial fit
    plt.scatter(x_test, y_test)
    plt.scatter(x_test, y_pred, color='red')
    plt.title(f"Polynomial Regression (degree=2)\nTest R^2 score: {r2_score:.2f}")
    plt.show()


The following interface allows you to experiment with different polynomial orders and different amounts of data.


In [ ]:
interact(f, order=(0, 6, 1), test_data=(0.05, 0.95, 0.05))

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;"> Question  #4a: </b>

  <b>We can perform polynomial transformations with more than one feature. Create a "PolynomialFeatures" object "pr1" of degree two.</b>
    
</div>



In [ ]:
# Write your code below and press Shift+Enter to execute

<details><summary>Click here for the solution</summary>

```python
pr1=PolynomialFeatures(degree=2)

```

</details>


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;"> Question  #4b: </b>

  <b>Transform the training and testing samples for the features 'doge_avg_price', 'eth_avg_price' and 'xrp_avg_price'. Hint: use the method `fit_transform`.</b>
    
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 

<details><summary>Click here for the solution</summary>

```python
x_train_pr1 = pr1.fit_transform(Xp_train[['doge_avg_price', 'eth_avg_price', 'xrp_avg_price']])
x_test_pr1=pr1.fit_transform(Xp_test[['doge_avg_price', 'eth_avg_price', 'xrp_avg_price']])


```

</details>


<!-- The answer is below:

x_train_pr1=pr.fit_transform(x_train[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']])
x_test_pr1=pr.fit_transform(x_test[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']])

-->


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;"> Question  #4c: </b>

  <b>How many dimensions does the new feature have? Hint: use the attribute "shape".</b>
    
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 

<details><summary>Click here for the solution</summary>

```python
x_train_pr1.shape 


```

</details>


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;"> Question  #4d:</b>

  <b>Create a linear regression model "poly1". Train the object using the method "fit" using the polynomial features.</b>
    
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 

<details><summary>Click here for the solution</summary>

```python
poly1=LinearRegression().fit(x_train_pr1,yp_train)


```

</details>


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;"> Question  #4e:</b>

  <b>Use the method  "predict" to predict an output on the polynomial features, then use the function "DistributionPlot" to display the distribution of the predicted test output vs. the actual test data.</b>
    
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute

<details><summary>Click here for the solution</summary>

```python
yhat_test1=poly1.predict(x_test_pr1)

Title='Distribution  Plot of  Predicted Value Using Test Data vs Data Distribution of Test Data'

DistributionPlot(y_test, yhat_test1, "Actual Values (Test)", "Predicted Values (Test)", Title)

```

</details>


## Ridge Regression


Ridge regression is a type of regularized linear regression method that can help to prevent overfitting in a model. It is often used when there are a large number of variables in the dataset and multicollinearity (correlation among predictor variables) is present.

In ridge regression, a penalty term is added to the least squares objective function, which shrinks the estimated coefficients towards zero. The amount of shrinkage is controlled by a hyperparameter, called the regularization parameter or lambda (λ), which determines the strength of the penalty. A higher value of λ will lead to more shrinkage and smaller coefficient estimates.

The ridge regression coefficient estimates are obtained by minimizing the following objective function:

$$\min_{\beta_0,\beta} \sum_{i=1}^{n} (y_i - \beta_0 - \sum_{j=1}^{p} \beta_j x_{ij})^2 + \lambda \sum_{j=1}^{p} \beta_j^2$$

where:

* $y_i$ is the observed response for the i-th observation.
* $x_{ij}$ is the j-th predictor variable for the i-th observation.
* $\beta_0$ and $\beta_j$ are the intercept and coefficient estimates, respectively.
* p is the number of predictor variables.
* n is the number of observations.
The first term in the objective function is the ordinary least squares (OLS) term, and the second term is the penalty term. The goal of ridge regression is to find the values of $\beta_0$ and $\beta_j$ that minimize the sum of these two terms.

Ridge regression can be particularly useful when dealing with multicollinearity, which can make the OLS coefficient estimates unstable and difficult to interpret. By adding a penalty term to the objective function, ridge regression can reduce the variance of the coefficient estimates, making them more stable and easier to interpret.





Let's perform a degree two polynomial transformation on our data.


In [ ]:
pr=PolynomialFeatures(degree=2)
x_train_pr=pr.fit_transform(Xp_train[['doge_avg_price', 'eth_avg_price', 'xrp_avg_price']])
x_test_pr=pr.fit_transform(Xp_test[['doge_avg_price', 'eth_avg_price', 'xrp_avg_price']])

Let's create a Ridge regression object, setting the regularization parameter (alpha) to 0.1


In [ ]:
RigeModel=Ridge(alpha=1)

Like regular regression, you can fit the model using the method <b>fit</b>.


In [ ]:
RigeModel.fit(x_train_pr, yp_train)

Similarly, you can obtain a prediction:


In [ ]:
yhat = RigeModel.predict(x_test_pr)

Let's compare the first five predicted samples to our test set:


In [ ]:
print('Predicted:', yhat[0:4])
print('Test set :', y_test[0:4].values)

We want to select the value of alpha that minimizes the test error. To do so, we can use a loop. We will also create a progress bar to see how many iterations we have completed.


In [ ]:
Rsqu_test = []
Rsqu_train = []
dummy1 = []
Alpha = 10 * np.array(range(0,1000))
pbar = tqdm(Alpha)

for alpha in pbar:
    RigeModel = Ridge(alpha=alpha) 
    RigeModel.fit(x_train_pr, yp_train)
    test_score, train_score = RigeModel.score(x_test_pr, yp_test), RigeModel.score(x_train_pr, yp_train)

    pbar.set_postfix({"Test Score": test_score, "Train Score": train_score})

    Rsqu_test.append(test_score)
    Rsqu_train.append(train_score)

We can plot out the value of $R^2$ for different alphas:


In [ ]:
width = 12
height = 10
plt.figure(figsize=(width, height))

plt.plot(Alpha,Rsqu_test, label='Validation data  ')
plt.plot(Alpha,Rsqu_train, 'r', label='Training Data ')
plt.xlabel('alpha')
plt.ylabel('R^2')
plt.legend()

**Figure 4**: The blue line represents the $R^2$ of the validation data, and the red line represents the $R^2$ of the training data. The x-axis represents the different values of Alpha.


Here the model is built and tested on the same data, so the training and test data are the same.

The **red line in Figure 4** represents the $R^2$ of the training data. As alpha increases the $R^2$ for train remains the same.

The **blue line** represents the $R^2$ on the validation data. As the value for alpha increases, the $R^2$ increases.


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;"> Question  #5:</b>

  <b>Perform Ridge regression. Calculate the $R^2$ using the polynomial features, use the training data to train the model and use the test data to test the model. The parameter alpha should be set to 10.</b>
    
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 

<details><summary>Click here for the solution</summary>

```python
RigeModel = Ridge(alpha=10) 
RigeModel.fit(x_train_pr, yp_train)
RigeModel.score(x_test_pr, yp_test)

```

</details>


## Grid Search


The term alpha is a hyperparameter. Sklearn has the class <b>GridSearchCV</b> to make the process of finding the best hyperparameter simpler.


**When to use GridSearchCV?**

You can use GridSearchCV when you want to:
* Select the optimal hyperparameters for your machine learning model.
* Automate the process of hyperparameter tuning.
* Search over a large space of possible hyperparameters.
* Obtain a more robust and accurate model.

For more info on GridSearchCV follow:
   * https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
   * https://towardsdatascience.com/gridsearchcv-for-beginners-db48a90114ee

Let's create a list with a dictionary of parameter values(we will need it for GridSearchCV):


In [ ]:
parameters1= [{'alpha': [0.001,0.1,1, 10, 100, 1000, 10000, 100000, 100000]}]
parameters1

Now, let's create a Ridge regression object:


In [ ]:
RR=Ridge()
RR

Also let's create a ridge grid search object:


In [ ]:
Grid1 = GridSearchCV(RR, parameters1,cv=4)


Fit the model:


In [ ]:
Grid1.fit(X[['doge_avg_price', 'eth_avg_price', 'xrp_avg_price']],y)

The object finds the best parameter values on the validation data. We can obtain the estimator with the best parameters and assign it to the variable BestRR as follows:


In [ ]:
BestRR=Grid1.best_estimator_
BestRR

We can see that the best parameter is `alpha=0.001`

Let's test our model on the test data:


In [ ]:
BestRR.score(X_test[['doge_avg_price', 'eth_avg_price', 'xrp_avg_price']], y_test)

Let's use this model to predict values and then visualize the results

In [ ]:
rr_predicted = BestRR.predict(X_test[['doge_avg_price', 'eth_avg_price', 'xrp_avg_price']])
rr_predicted

In [ ]:
data = {'Predicted': rr_predicted, 'Actual': y_test}
df = pd.DataFrame(data)

# Use seaborn to create a scatter plot with a regression line
sns.lmplot(x='Predicted', y='Actual', data=df, scatter_kws={'color': 'blue'}, line_kws={'color': 'red'})

# Display the plot
plt.show()

As you can see using `GridSearchCV` we were able to find optimal alpha and improve our model.

# **Thank you for completing Lab 5!**

## Authors

<a href="https://author.skills.network/instructors/nazar_kohut">Nazar Kohut</a>

<a href="https://author.skills.network/instructors/yaroslav_vyklyuk_2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Yaroslav Vyklyuk, DrSc, PhD</a>

<a href="https://author.skills.network/instructors/mariya_fleychuk?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Mariya Fleychuk, DrSc, PhD</a>


## Change Log

| Date (YYYY-MM-DD) | Version | Changed By   | Change Description                                         |
| ----------------- | ------- | -------------| ---------------------------------------------------------- |
|     2023-25-03    |   1.0   | Nazar Kohut  | Lab created                                                |

<hr>

## <h3 align="center"> © IBM Corporation 2023. All rights reserved. <h3/>