<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="500" alt="cognitiveclass.ai logo">
</center>

# **Investigation of MATIC/BUSD exchange rate dynamic,  calculation and analysis of separate  technical financial indicators of cryptocurrency market (ATR, OBV, RSI, AD)**

## **Lab 6. Final assignment**

## **Tasks**
* Formulate a problem to solve, related to cryptocurrency trading, using an available dataset;
* Use the tools, described in previous labs;
* Submit the main conclusions of solving the problem, which are relevant for a cryptocurrency trader.


Estimated time needed: **2** hours

## **Objectives**

After completing this lab you will be able to:

*   Be confident about your data analysis skills


## **Table of Contents**

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li>Question 1 (Reading the dataset)</li>
    <li>Question 2 (The dataset display)</li>
    <li>Question 3 (Calculating the technical indicators)</li>
    <ul>
        <li>Bollinger Bands (BB)</li>
        <li>Moving Average (MA)</li>
        <li>Ultimate Oscillator (UO)</li>
    </ul>
    <li>Question 4 (Dropping NaN's and resampling the dataset)</li>
    <li>Question 5 (Generating NaN's and restoring the missing values)</li>
    <li>Question 6 (Calculating correlation)</li>
    <ul>
        <li>Pearson Correlation</li>
    </ul>
    <li>Question 7 (Building Multiple Linear Regression model. Calculating MSE, R-squared of the model)</li>
    <ul>
        <li>Mean Squared Error (MSE)</li>
        <li>R-squared</li>
    </ul>
    <li>Question 8 (Splitting the dataset and building ridge models with different alpha parameters. Calculating MSE, R-squared of the models)</li>
    <li>Question 9 (Performing Grid-Search on the ridge model. Calculating R-squared)</li>
    <li>Question 10 (Calculating cross-validation score)</li>
    <li>Conclusion</li>
    <li>Sources</li>
</ol>

</div>

<hr>


## **Dataset Description**

### **Files**
* #### **MATICBUSD_trades_1m_preprocessed.csv** - the file contains historical changes of the pair **MATIC/BUSD** and ATR, OBV, RSI, AD indicators for the period from 11/11/2022 to 12/29/2022 with an aggregation time of 1 minute. **MATIC/BUSD** - the exchange rate of **MATIC** cryptocurrency to **BUSD** cryptocurrency

### **Columns**

* #### `Ts` - the timestamp of the record
* #### `Open` -  the price of the asset at the beginning of the trading period
* #### `High` -  the highest price of the asset during the trading period
* #### `Low` - the lowest price of the asset during the trading period.
* #### `Close` - the price of the asset at the end of the trading period
* #### `Volume` - the total number of shares or contracts of a particular asset that are traded during a given period
* #### `Rec_count` -  the number of individual trades or transactions that have been executed during a given time period
* #### `Avg_price` - the average price at which a particular asset has been bought or sold during a given period
* #### `ATR` - average true range indicator
* #### `OBV` - on-balance volume indicator
* #### `RSI` - relative strength index indicator
* #### `AD` - accumulation / distribution indicator


In [ ]:
# install specific version of libraries used in lab
# ! conda install -q -y pandas
# ! conda install -q -y numpy
! conda install -q -y -c anaconda scikit-learn
! conda install -q -y -c conda-forge ta-lib

In [ ]:
import pandas as pd
import numpy as np
import talib
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV

pd.set_option("display.precision", 4)
pd.options.display.float_format = "{:.4f}".format

In [ ]:
path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX03MTEN/MATICBUSD_trades_1m.csv"

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
    
# **Question #1:**

**Read the dataset with parameter `filepath_or_buffer` as `path`, parse column "ts" as datetime and set it as index. The dataset assign to variable `df`**
    
**Interpretation of the results: The dataset is read and contains 66861 rows and 7 column. "Ts" column is set as index**
    
</div>


In [ ]:
# Write your solution here


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
    
# **Question #2:**
    
**Show the first 5 rows of the dataset**
    
**Interpretation of the results: The first 5 rows of the data set are shown. "Ts" column contain fields from 2022-11-11 14:38:00 to 2022-11-11 14:42:00**

</div>


In [ ]:
# Write your solution here


<div class="alert alert-danger alertdanger" style="margin-top: 20px">

# **Question #3:**
    
**Calculate technicals indicators such as: Bollinger Bands (BB), Moving Average (MA), Ultimate Oscillator (UO) using library `talib` and append them to the dataset. Show the tail of the dataset**
    
**Hint: calculate BB and MA based on "close" column. Timeperiod for MA, BB = 29. Timeperiods for UO (`timeperiod1=7`, `timeperiod2=14`, `timeperiod3=28`)**
    
**Interpretation of the results: The columns "MA", "BBANDS_up", "BBANDS_mid", "BBANDS_low", "ULTOSC" are added to the dataset. 5 rows of the following columns from the tail of the dataset are shown: "MA", "BBANDS_up", "BBANDS_mid", "BBANDS_low", "ULTOSC"**
    
</div>


## **Bollinger Bands (BB)**

A **Bollinger Band** is a technical analysis tool defined by a set of trendlines. They are plotted as two standard deviations, both positively and negatively, away from a **simple moving average** (**SMA**) of a security's price and can be adjusted to user preferences.

**Bollinger Bands** was developed by technical trader John Bollinger and designed to give investors a higher probability of identifying when an asset is oversold or overbought. 


<center><img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX03MTEN/dotdash_INV-final-Bollinger-Band-Definition-June-2021-01-518977e3031d405497003f1747a3c250.webp" alt="bollinger-bands"></center>
<center>The example of Bollinger Bands</center>


**Bollinger Bands Formula**:

<h3 style="margin-top: 0px; margin-bottom: 0px">$$BOLU = MA(TP,n) + m * \sigma[TP,n]$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$BOLM = MA(TP,n)$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$BOLD = MA(TP,n) - m * \sigma[TP,n]$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\text{where:}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$BOLU = \text{Upper Bollinger Band}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$BOLM = \text{Middle Bollinger Band}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$BOLD = \text{Lower Bollinger Band}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$MA = \text{Moving Average}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$TP (\text{typical price}) = \frac{(High \; + \; Low \; + \; Close)}{3}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$n = \text{Number of days in smoothind period (typically 20)}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$m = \text{Number of standard deviations (typically 2)}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\sigma[TP,n] = \text{Standard Deviation over last} \; n \; \text{periods of} \; TP$$</h3>


The description of function to calculate Bollinger Bands


In [ ]:
?talib.BBANDS

## **Moving Average (MA)**

In finance, a **moving average** (**MA**) is a stock indicator commonly used in technical analysis. The reason for calculating the moving average of a stock is to help smooth out the price data by creating a constantly updated average price.

By calculating the **moving average**, the impacts of random, short-term fluctuations on the price of a stock over a specified time frame are mitigated. **Simple moving averages** (**SMAs**) use a simple arithmetic average of prices over some timespan, while **exponential moving averages** (**EMAs**) place greater weight on more recent prices than older ones over the time period. 


<center><img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX03MTEN/dotdash_INV-final-Simple-Moving-Average-SMA-May-2021-01-98751e52a2d844a795d8d11434852d7c.webp" alt="simple-moving-average"></center>

<center>The example of Simple Moving Average</center>


**Simple Moving Average Formula**:

<h3 style="margin-top: 0px; margin-bottom: 0px">$$SMA = \frac{1}{n} \sum_{i}^{n} A_{i}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\text{where:}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$SMA = \text{Simple Moving Average}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$A = \text{Average in period} \; n$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$n = \text{Number of time periods}$$</h3>


The description of function to calculate Moving Average


In [ ]:
?talib.MA

## **Ultimate Oscillator (UO)**

The **Ultimate Oscillator** is a technical indicator that was developed by Larry Williams in 1976 to measure the price momentum of an asset across multiple timeframes. By using the weighted average of three different timeframes the indicator has less volatility and fewer trade signals compared to other oscillators that rely on a single timeframe. Buy and sell signals are generated following divergences. The **Ultimately Oscillator** generates fewer divergence signals than other oscillators due to its multi-timeframe construction. 


<center><img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX03MTEN/UltimateOscillator-5c93b21dc9e77c0001faafcf.webp" alt="ultimate-oscillator"></center>

<center>The example of Ultimate Oscillator</center>


**Ultimate Oscillator Formula**:

<h3 style="margin-top: 0px; margin-bottom: 0px">$$UO = [\frac{(A_{7} \; * \;4) \; + \; (A_{14} \; * \; 2) \; + \; A_{28}}{4 \; + \; 2 \; + \; 1}] * 100$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\text{where:}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$UO = \text{Ultimate Oscillator}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$A_{i} = \frac{\sum_{i=1}^{n} BP}{\sum_{i=1}^{n} TR}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\text{Buying Pressure} \; (BP) = Close - Min(Low; PC)$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$PC = \text{Prior Close}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\text{True Range} (TR) = Max(High; Prior \; Close) - Min(Low; Prior \; Close)$$</h3>


The description of function to calculate Ultimate Oscillator


In [ ]:
?talib.ULTOSC

In [ ]:
# Write your solution here


<div class="alert alert-danger alertdanger" style="margin-top: 20px">

# **Question #4:**
    
**Drop rows that contains `NaN` and resample data to 15 minutes and then drop `NaN` again (because some data is missing for more than 15 minutes). Show the results**
    
**Interpretation of the results: The dataset must contain 4492 rows and 12 columns and be shown. "Ts" column contain fields from 2022-11-11 15:00:00 to 2022-11-11 16:00:00**
</div>


In [ ]:
# Write your solution here


<div class="alert alert-danger alertdanger" style="margin-top: 20px">

# **Question #5:**
    
**Add `NaN` to column "open" using `spoil_df` function and restore the data using `pchip` interpolation. Show the results**
    
**Interpretation of the results: First, `NaN` are added to the "open" column using `spoil_df` function, then the column is restored using interpolation, accordingly, the "open" column does not contain `NaN`. The first 5 rows of the data set are shown**
</div>


In [ ]:
def spoil_df(df: pd.DataFrame, cols: list = ["open"], p=0.1):
    """
    This function sets the column element `cols` to NaN with probability 0.1
    
    parameters
    ----------
    df: pd.DataFrame
        Dataframe to perform calculations on
    cols: list
        List of columns to set NaN's
    """
    rng = np.random.default_rng(seed=42)
    new_df = df.copy()
    
    for col in cols:
        m = rng.random(len(df))
        l1 = p
        mask1 = m < l1 # NaN
        new_df.loc[mask1, col] = np.NaN
        
    return new_df

In [ ]:
# Write your solution here


<div class="alert alert-danger alertdanger" style="margin-top: 20px">

# **Question #6:**
    
**Calculate correlation between "avg_price" and technical indicators**
    
**Interpretation of the results: Shown is a dataset containing "avg_price", "MA", "ULTOSC", "BBANDS_up", "BBANDS_mid" "BBANDS_low" as rows and columns. The field of an arbitrary row and column is a correlation between them**
</div>


## **Pearson Correlation**

The **Pearson Correlation** measures the linear dependence between two variables $X$ and $Y$. The Pearson correlation coefficient attempts to establish a line of best fit through a dataset of two variables by essentially laying out the expected values and the resulting Pearson's correlation coefficient indicates how far away the actual dataset is from the expected values. Depending on the sign of our Pearson's correlation coefficient, we can end up with either a negative or positive correlation if there is any sort of relationship between the variables of our data set.

The resulting coefficient is a value between -1 and 1 inclusive, where:

- **1**: Perfect positive linear correlation.
- **0**: No linear correlation, the two variables most likely do not affect each other.
- **-1**: Perfect negative linear correlation.


The population correlation coefficient $ \rho_{X,Y}$ between two random variables $X$ and $Y$ with expected values $\mu _{X}$ and $\mu _{Y}$ and standard deviations $\sigma _{X}$ and $\sigma_Y$ is defined as:

<center><h3>$\rho_{X,Y} = \operatorname{corr}(X, Y) = \frac{\operatorname{cov}(X,Y)}{\sigma _{X} \sigma_Y} = \frac{\operatorname{E}[(X \; - \; \mu_{X})(Y \; - \; \mu_{Y})]}{\sigma _{X} \sigma_Y}, \quad \text{if} \; \sigma_{X} \sigma_Y > 0 $</h3></center>


In [ ]:
# Write your solution here


We can see that **"BBANDS_mid"** is equal to **"MA"** so we will use only **"MA"** for model training


<div class="alert alert-danger alertdanger" style="margin-top: 20px">

# **Question #7:**
    
**Build MLR (Multiple Linear Regression) model using technical indicators as inputs and calculate $MSE$ and $R^2$. Show the results**
    
**Hint: use `mean_squared_error` and `score` functions**
    
**Interpretation of the results: Builded MLR model. $MSE$ is 0.0000042, $R^2$ is 0.9986049**
</div>


## **Mean Squared Error (MSE)**

The **Mean Squared Error** measures the average of the squares of errors. That is, the difference between actual value ($y$) and the estimated value ($\widehat{y}$).

The formula of $MSE$


<h3 style="margin-top: 0px; margin-bottom: 0px">$$MSE = \frac{1}{n}{\sum_{i=1}^{n} (y_{i} - \widehat{y_{i}})^2}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\text{where:}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$y_{i} = \text{actual value}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\widehat{y_{i}} = \text{predicted value}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$n = \text{number of the samples}$$</h3>


## **R-squared**

**R squared**, also known as the coefficient of determination, is a measure to indicate how close the data is to the fitted regression line.

The value of the R-squared is the percentage of variation of the response variable ($y$) that is explained by a linear model.

**R-squared** values range from 0 to 1 and are commonly stated as percentages from 0% to 100%. An R-squared of 100% means that all movements of a security (or another dependent variable) are completely explained by movements in the index (or the independent variable(s) you are interested in).

In finance, an **R-Squared** above 0.7 would generally be seen as showing a high level of correlation, whereas a measure below 0.4 would show a low correlation.

The formula of $R^2$


<h3 style="margin-top: 0px; margin-bottom: 0px">$$R^2 = 1 - \frac{\sum_{i=1}^{n}(y_{i}-\widehat{y_{i}})^2}{\sum_{i=1}^{n}(y_{i}-\overline{y})^2}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\text{where:}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$y_{i} = \text{actual value}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\widehat{y_{i}} = \text{predicted value}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$\overline{y} = \text{mean value}$$</h3>
<h3 style="margin-top: 0px; margin-bottom: 0px">$$n = \text{number of the samples}$$</h3>


In [ ]:
# Write your solution here


<div class="alert alert-danger alertdanger" style="margin-top: 20px">

# **Question #8:**
    
**Split the dataset into train (80%) and test (20%), parameter `shuffle` set to False. Build 3 Ridge models with parameter alpha of (0.01, 0.1, 1) using technical indicators as inputs. Create a dict of the models (key should be equal to alpha) and calculate $MSE$ and $R^2$ on test data. Show the results**

**Hint: use `train_test_split` function**
    
**Interpretation of the results: Builded 3 MLR models. The best $MSE$ is 0.0000020, $R^2$ is 0.9777788**
</div>


In [ ]:
# Write your solution here


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
    
# **Question #9:**

**Create a ridge model and perform Grid-Search with parameter alpha of (0.01, 0.1, 1) and `cv=5`. Train the model on train data obtained in Question #8. Calculate $R^2$ of the best model on test data. Show the results**
    
**Interpretation of the results: Performed Grid-Search on the created ridge model. R-squared of the best model is 0.9777788**
    
</div>


In [ ]:
# Write your solution here


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
    
# **Question #10:**
    
**Calculate cross-validation score ($R^2$) using `cross_val_score` using previous models as the estimators, set parameter `cv=5`. Average the results and show them**
    
**Interpretation of the results: Obtained the best cross-validation score of 0.9910495**
</div>


In [ ]:
# Write your solution here


# **11. Conclusion**

In this laboratory work, I repeated and consolidated the knowledge gained from laboratory works 1 (dataset operations, calculation of technical indicators, namely: Bollinger Bands (BB), Moving Average (MA), Ultimate Oscillator (UO)), 2 (adding `NaN`, restoring the column by interpolation, resampling), 3 (calculation of correlation), 4 (building of MLR models, calculation of $MSE$, $R^2$), 5 (dataset splitting, calculation of cross-validation score, building of the Ridge model, use of Grid-Search)


# **12. Sources**:

- [https://www.investopedia.com/terms/b/bollingerbands.asp](https://www.investopedia.com/terms/b/bollingerbands.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX03MTEN2608-2023-01-01)
- [https://www.investopedia.com/terms/m/movingaverage.asp](https://www.investopedia.com/terms/m/movingaverage.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX03MTEN2608-2023-01-01)
- [https://www.investopedia.com/terms/u/ultimateoscillator.asp](https://www.investopedia.com/terms/u/ultimateoscillator.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX03MTEN2608-2023-01-01)
- [https://www.investopedia.com/thmb/nTfemzLtwmgAFJ5k-qXRxUFn95Y=/750x0/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/dotdash_INV-final-Bollinger-Band-Definition-June-2021-01-518977e3031d405497003f1747a3c250.jpg](https://www.investopedia.com/thmb/nTfemzLtwmgAFJ5k-qXRxUFn95Y=/750x0/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/dotdash_INV-final-Bollinger-Band-Definition-June-2021-01-518977e3031d405497003f1747a3c250.jpg)
- [https://www.investopedia.com/thmb/EFuAw39GHZVgFoYCq8DDEpRsFXo=/750x0/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/dotdash_INV-final-Simple-Moving-Average-SMA-May-2021-01-98751e52a2d844a795d8d11434852d7c.jpg](https://www.investopedia.com/thmb/EFuAw39GHZVgFoYCq8DDEpRsFXo=/750x0/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/dotdash_INV-final-Simple-Moving-Average-SMA-May-2021-01-98751e52a2d844a795d8d11434852d7c.jpg)
- [https://www.investopedia.com/thmb/a8gkNBvoLXitglopdImyRUczDw0=/750x0/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/UltimateOscillator-5c93b21dc9e77c0001faafcf.png](https://www.investopedia.com/thmb/a8gkNBvoLXitglopdImyRUczDw0=/750x0/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/UltimateOscillator-5c93b21dc9e77c0001faafcf.png)


# **Thank you for completing this lab!**

## Author

<a href="https://author.skills.network/instructors/borys_melnychuk?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX03MTEN2608-2023-01-01" >Borys Melnychuk</a>

<a href="https://author.skills.network/instructors/yaroslav_vyklyuk_2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Yaroslav Vyklyuk, DrSc, PhD</a>

<a href="https://author.skills.network/instructors/mariya_fleychuk?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Mariya Fleychuk, DrSc, PhD</a>



## Change Log

| Date (YYYY-MM-DD) | Version | Changed By      | Change Description                                         |
| ----------------- | ------- | ----------------| ---------------------------------------------------------- |
|     2023-04-01    |   1.0   | Borys Melnychuk | Creation of the lab                                        |

<hr>

## <h3 align="center"> © IBM Corporation 2023. All rights reserved. </h3>
