<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX05JMEN/SN_web_lightmode.png?1679513184440" width="300" alt="cognitiveclass.ai logo">
</center>

# Investigation of BTC/BUSD cryptocurrency using ADOSC, NATR, TRANGE indicators, and other cryptocurrencies.

## Lab 6. Self Evaluation

Estimated time needed: **30** minutes

## Objectives

After completing this lab you will be able to:

*   Be confident about your data analysis skills


<h3>Table of Contents</h3>

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li><u>Setup</u></li>
    <li><u>Question to solve</u></li>
</ol>
</div>

<hr>

#### ***Dataset description***

The dataset used in this lab contains time-series data on various attributes related to Bitcoin (BTC) and other cryptocurrencies, aggregated at 1-minute intervals. The dataset index represents the time period for which the data is reported(1 minute). Also the dataset contains binned average prices of other cryptocurrencies. 

<hr>

**Attributes:**

* ***General:***
    * `open` - the opening price of a **BTC** during a specific time period.
    * `high` - the highest price of a **BTC** during a specific time period.
    * `low` - the lowest price of a **BTC** during a specific time period.
    * `close` - the closing price of a **BTC** during a specific time period.
    * `rec_count` - the number of records or data points in the dataset for a given time period.
    * `volume` - the total amount of trading activity (buying and selling) for a **BTC** during a specific time period.
    * `avg_price` - the average price of a **BTC** during a specific time period.


* ***Indicators***
    * `ADOSC` - an indicator used in technical analysis to measure the momentum of buying and selling pressure for ***Bitcoin***.
    * `NATR` - an indicator used in technical analysis to measure the volatility of ***Bitcoin***.
    * `TRANGE` - an indicator used in technical analysis to measure the range of prices (from high to low) for ***Bitcoin*** during a specific time period.


* ***Other cryptocurrencies:***
    * `ape_avg_price` - the average price of ***APE*** during a specific time period.
    * `bnb_avg_price` - the average price of ***BNB*** during a specific time period.
    * `doge_avg_price` - the average price of ***DOGE coin*** during a specific time period.
    * `eth_avg_price` - the average price of ***Ethereum*** during a specific time period.
    * `xrp_avg_price` - the average price of ***XRP*** during a specific time period.
    * `matic_avg_price` - the average price of ***MATIC*** during a specific time period.
* ***Categorical:***
    * `category` - binned ***average price for BTC(`avg_price`)*** to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `ape_category` - binned `ape_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `bnb_category` - binned `bnb_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `doge_category` - binned `doge_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `eth_category` - binned `eth_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `xrp_category` - binned `xrp_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    * `matic_category` - binned `matic_avg_price` to bins: `high`, `medium-high`, `medium`, `medium-low`, `low`.
    
<hr>

*The indicators `ADOSC`, `NATR`, and `TRANGE` are used in technical analysis to provide insights into the momentum, volatility, and price ranges of financial instruments or assets. The other attributes represent the average prices of different cryptocurrencies during a specific time period.*

<hr>

## Setup

You will need the following libraries:


In [ ]:
!mamba install pandas -y
!mamba install numpy -y
!mamba install matplotlib -y
!mamba install scipy -y
!mamba install seaborn -y 
!mamba install statsmodels -y
!mamba install scikit-learn -y 
!mamba install tqdm -y
!mamba install -c conda-forge ta-lib -y

In [ ]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

import talib
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler,PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
# add your own import's, if needed

Initializing path to data

In [ ]:
path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX05JMEN/BTCBUSD_1min_categories.csv'

Load the csv:


In [ ]:
df = pd.read_csv(path)

We use the method  <code>head()</code>  to display the first 5 columns of the dataframe:


In [ ]:
df.head()

In [ ]:
df.shape

In [ ]:
df.columns

In [ ]:
df = df[['ts', 'open', 'high', 'low', 'close', 'rec_count', 'volume',
       'avg_price', 'NATR']]

In [ ]:
df.columns

## Question to solve

**Question 1**:  Calculate Technical indicators. 

This question consists of several subquestions, and every indicator is described here. Also, **you can use any library to calculate these indicators and write your own methods to perform such computations**. 

### TRIX(Triple Exponential Average)

The **Triple Exponential Average (TRIX) is a momentum oscillator that is calculated by smoothing the price data three times**. It is **used to identify overbought and oversold conditions**, as well as to **generate buy and sell signals**.

The TRIX calculation is as follows:

* $EMA1=EMA(close,n)$

* $EMA2=EMA(EMA1,n)$

* $EMA3=EMA(EMA2,n)$

$$TRIX = \frac{EMA3 - EMA3_prev}{EMA3_prev}$$

where:

$close$ is the closing price of the asset

$n$ is the number of periods used for the moving average

$EMA$ is the exponential moving average

$EMA1$ is the first smoothed value

$EMA2$ is the second smoothed value

$EMA3$ is the third smoothed value

$EMA3_{prev}$ is the previous value of the third smoothed value

$TRIX$ is the TRIX value at a given point in time

The TRIX indicator is typically plotted as a line chart that oscillates around the zero line. **When the TRIX line crosses above the zero line, it is considered a bullish signal, and when it crosses below the zero line, it is considered a bearish signal**. Traders also look for divergence between the TRIX and price action as an indication of a potential trend reversal.

***Question 1.1:*** Calculate TRIX(use `timeperiod=10`) and check how many NaN values you have in your `pd.Series` or `np.array`.

In [ ]:
# your code goes here
trix = talib.TRIX(df['close'], timeperiod=10)
trix[-20:]

Looking on how many NaN values do we have:

In [ ]:
# your code goes here
trix.count()

### Fibonacci retracement

The formula for Fibonacci retracement is:
$$R=C+(H−C)×P$$

where:

* $R$ is the retracement level
* $C$ is the closing price of the asset
* $H$ is the highest price of the asset over the selected time period
* $P$ is the Fibonacci ratio, usually one of the following: 0.236, 0.382, 0.500, 0.618, or 0.786.

***Question 1.2:*** Create method that calculates fibonacci retracement levels and display them:

In [ ]:
# your code goes here
p = np.array([0.236, 0.382, 0.5, 0.618, 0.786])
def calculate_fib_retracement(df: pd.DataFrame, p:np.array=np.array([0.236, 0.382, 0.5, 0.618, 0.786])):
    c = df['close'].values
    h = df['high'].values
    r = list()

    for i in range(len(df)):
        price_range = h[i] - c[i]
        levels = c[i] + price_range * p
        r.append(levels)

    return np.array(r)

fibonacci_levels = calculate_fib_retracement(df, p)
print(len(fibonacci_levels))
print(fibonacci_levels[:10])

### MOM(momentum indicator):

**The momentum indicator, often abbreviated as MOM, measures the change in price of a security over a specific time period. It is calculated by subtracting the closing price from the closing price X days ago.** The **formula for the momentum indicator**:

$$MOM=C_t - C_{t-x}$$

Where:

* $C_t$ represents the closing price of the security on the current day.
* $C_{t-X}$ represents the closing price of the security X days ago.

The **momentum indicator is typically used to identify trends and potential reversal points.** If the **momentum indicator is positive, it suggests that the security is gaining upward momentum**, while a **negative momentum reading suggests downward momentum**. Traders often use the momentum indicator in conjunction with other technical analysis tools to confirm or negate potential trade signals.

***Question 1.3:*** Calculate MOM(momentum indicator) use a `timeperiod=5`; display values; find out how many NaN values you have:

In [ ]:
# your code goes here
mom = talib.MOM(df['close'], timeperiod=5)
mom

Looking on how many NaN values do we have:

In [ ]:
# your code goes here
mom.count()

### New indicators inside our DataFrame

***Question 1.4:*** Add new indicators to main `pd.DataFrame` and drop all rows that have NaN values inside of them(do not forget to check if you have done everything correctly):

Adding technical indicators to our pd.DataFrame:

In [ ]:
# you code goes here
df["trix"] = pd.Series(trix) 
for i, v in enumerate(p):
    df[f"fib_retracement_{v}"] = fibonacci_levels[:, i]
df["mom"] = pd.Series(mom)
df.head()

Deleting rows with empty values:

In [ ]:
# you code goes here
df = df.dropna()
df.count()

**Question 2**:  Display the data types of each column using the attribute `dtype`.


In [ ]:
df.dtypes

**Question 3** Create correlation matrix and heatmap.


In [ ]:
corr = df.corr()

plt.figure(figsize=(50, 50))
sns.heatmap(corr, annot=True, cmap='coolwarm')

plt.show()

**Question 4:** Draw `regplot` between TRIX and average price field.


In [ ]:
width = 12
height = 10
plt.figure(figsize=(width, height))
sns.regplot(x='trix', y='avg_price', data=df, line_kws={"color": "k"})
plt.ylim(df['avg_price'].min(),df['avg_price'].max())

**Question 5:** Split data into train and test datasets. Do not forget to use `shuffle=False`.


In [ ]:
# y is a target value column - the value we want to predict
y = df['avg_price']
# deleting target column from X dataset
X = df.drop('avg_price', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, shuffle=False)

**Question 6:** Create Linear Regression model.

In [ ]:
lr = LinearRegression()

**Question 7:** Fit training data into the model, use `trix` to predict `avg_price`.


In [ ]:
lr.fit(X_train[['trix']], y_train)

**Note:** Please use `test_size = 0.10` for the following questions.


**Question 8:** Calculate test and train metrics: $R^2$, **MSE, RMSE, MAE** for linear regression model you created previously.


In [ ]:
def get_metrics(y_real, y_pred) -> tuple:
    """
    :returns tuple of metrics(MSE, RMSE, MAE)
    """
    mse = mean_squared_error(y_real, y_pred)
    rmse = np.sqrt(mse)
    mae = mean_squared_error(y_real, y_pred)
    return mse, rmse, mae

print(f"Train R^2: {lr.score(X_train[['trix']], y_train):.4f}; Test R^2: {lr.score(X_test[['trix']], y_test):.4f}")
lr_y_pred_train = lr.predict(X_train[['trix']])
lr_y_pred_test = lr.predict(X_test[['trix']])

lr_train_metrics_ = get_metrics(y_train, lr_y_pred_train)
lr_test_metrics_ = get_metrics(y_test, lr_y_pred_test)
print(f"Train mse: {lr_train_metrics_[0]:.4f}; rmse: {lr_train_metrics_[1]:.4f}, mae: {lr_train_metrics_[2]:.4f}")
print(f"Test mse: {lr_test_metrics_[0]:.4f}; rmse: {lr_test_metrics_[1]:.4f} mae: {lr_test_metrics_[2]:.4f}")

**Question 9**: Create and fit a Ridge regression object using the training data(using `trix` and `natr` to calculate `avg_price`), setting the regularization parameter to 0.1 calculate train and test metrics.


In [ ]:
RR=Ridge(alpha=1.0)
RR.fit(X_train[['trix', 'NATR']], y_train)
print(f"Train r^2: {RR.score(X_train[['trix', 'NATR']], y_train):.4f}, Test r^2: {RR.score(X_test[['trix', 'NATR']], y_test):.4f}")

**Question 10**: Find better regularization parameter(or prove that the one we used above it the best) using `GridSearchCV`, calculate $R^2$, 

Finding best estimator and fitting train data into it:

In [ ]:
# you code goes here
parameters1= [{'alpha': [0.001,0.1,1, 10, 100, 1000, 10000, 100000, 100000]}]
Grid1 = GridSearchCV(RR, parameters1,cv=4)

Grid1.fit(X_train[['trix', 'NATR']], y_train)
BestRR=Grid1.best_estimator_
print(BestRR)

Calculating $R^2$ for the model with best estimator:

In [ ]:
# you code goes here
print(f" {BestRR.score(X_train[['trix', 'NATR']], y_train):.4f}, {BestRR.score(X_test[['trix', 'NATR']], y_test):.4f}")

Making predictions on test data and visualizing the results:

In [ ]:
rr_predicted = BestRR.predict(X_test[['trix', 'NATR']])

data = {'Predicted': rr_predicted, 'Actual': y_test}
df = pd.DataFrame(data)

# Use seaborn to create a scatter plot with a regression line
sns.lmplot(x='Predicted', y='Actual', data=df, scatter_kws={'color': 'blue'}, line_kws={'color': 'red'})

# Display the plot
plt.show()

# **Thank you for completing Lab 6!**

## Authors

<a href="https://author.skills.network/instructors/nazar_kohut">Nazar Kohut</a>

<a href="https://author.skills.network/instructors/yaroslav_vyklyuk_2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Yaroslav Vyklyuk, DrSc, PhD</a>

<a href="https://author.skills.network/instructors/mariya_fleychuk?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Mariya Fleychuk, DrSc, PhD</a>


## Change Log

| Date (YYYY-MM-DD) | Version | Changed By   | Change Description                                         |
| ----------------- | ------- | -------------| ---------------------------------------------------------- |
|     2023-01-04    |   1.0   | Nazar Kohut  | Lab created                                                |

<hr>

## <h3 align="center"> © IBM Corporation 2023. All rights reserved. <h3/>