<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="500" alt="cognitiveclass.ai logo">
</center>

# **Investigation of cryptocurrency exchange rate dynamic (on the example of cryptocurrency pair MATIC/BUSD), сalculation and analysis of technical financial indicators, characterizing the cryptocurrency market (the example of ATR, OBV, RSI, AD)**

## **Lab 1. Dataset creation**

Estimated time needed: **30** minutes

## **The tasks**

* Download and process statistical time series of  cryptocurrency pair MATIC/BUSD, describing the dynamics of the cryptocurrency market; 
* Upload statistical data (framework) from the Pandas library;
* Calculate and analyze technical financial indicators for cryptocurrecny indicators analysis (ATR, OBV, RSI, AD).

## **Objectives**

#### After completing this lab you will be able to:

* acquire data in various ways;
* obtain insights from data with Pandas library;
* calculate technical financial indicators. 


## **Table of Contents**

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li>Data Acquisition</li>
            <ul>
                <li>Read data</li>
                <li>Renaming of columns</li>
            </ul>
        <li>Calculation of technical financial indicators</li>
             <ul>
                <li>Average True Range (ATR)</li>
                <li>On-Balance Volume (OBV)</li>
                <li>Relative Strength Index (RSI)</li>
                <li>Chaikin A/D Line (AD)</li>
            </ul>
        <li>Basic Insight of Dataset</li>
         <ul>
            <li>Data types</li>
            <li>Describe</li>
            <li>Info</li>
            <li>Save Dataset</li>
            <li>Read / save other data formats</li>
        </ul>
        <li>Sources</li>
    </ol>
</div>
<hr>


## **Dataset Description**

### **Files**
* #### **MATICBUSD_trades_1m.csv** - the file contains exchange rates of **MATIC/BUSD** for the period from 11/11/2022 to 12/29/2022 with an aggregation time of 1 minute. **MATIC/BUSD** - the exchange rate of **MATIC** cryptocurrency to **BUSD** cryptocurrency

### **Columns**

* #### `ts` - the timestamp of the record
* #### `open` -  the price of the asset at the beginning of the trading period
* #### `high` -  the highest price of the asset during the trading period
* #### `low` - the lowest price of the asset during the trading period.
* #### `close` - the price of the asset at the end of the trading period
* #### `volume` - the total number of shares or contracts of a particular asset that are traded during a given period
* #### `rec_count` -  the number of individual trades or transactions that have been executed during a given time period
* #### `avg_price` - the average price at which a particular asset has been bought or sold during a given period


There are various formats for a dataset: .csv, .json, .xlsx  etc. The dataset can be stored in different places, on your local machine or sometimes online.<br>

In this section, you will learn how to load a dataset into our Jupyter Notebook.<br> 
      
In our case, the MATIC/BUSD Dataset is an online source, and it is in a CSV (comma separated value) format. Let's use this dataset as an example to practice data reading.

<ul>
    <li>Data source: <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX030WEN/MATICBUSD_trades_1m.csv" target="_blank">https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX030WEN/MATICBUSD_trades_1m.csv</a></li>
    <li>Data type: csv</li>
</ul>

The Pandas Library is a useful tool that enables us to read various datasets into a dataframe; our Jupyter notebook platforms have a built-in <b>Pandas Library</b> so that all we need to do is import Pandas without installing.


Run the following cell to install required libraries:


In [ ]:
# install specific version of libraries used in lab
# ! conda install pandas -y
# ! conda install numpy -y
! conda install -c conda-forge ta-lib -y
# ! conda install -c conda-forge matplotlib -y

In [ ]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import talib

# **1. Data Acquisition**

## **Read Data**

We use `pandas.read_csv()` function to read the csv file. In the brackets, we put the file path along with a quotation mark so that pandas will read the file into a dataframe from that address. The file path can be either an URL or your local file address.<br>

You can also assign the dataset to any variable you create.


In [ ]:
path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX030WEN/MATICBUSD_trades_1m.csv"

This dataset was hosted <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX030WEN/MATICBUSD_trades_1m.csv">HERE</a>


In [ ]:
# Read the file and assign it to variable "df"
df = pd.read_csv(path, index_col=0)

After reading the dataset, we can use the `dataframe.head(n)` method to check the top n rows of the dataframe, where n is an integer. Contrary to `dataframe.head(n)`, `dataframe.tail(n)` will show you the bottom n rows of the dataframe.


In [ ]:
# show the first 5 rows using dataframe.head() method
print("The first 5 rows of the dataframe")
df.head(5)

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1><strong>Question #1:</strong></h1>
    
<p><strong>Check the bottom 10 rows of data frame <code>df</code></strong></p>

</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 


<details><summary>Click here for the solution</summary>

```python
print("The last 10 rows of the dataframe\n")
df.tail(10)
```


## **Renaming of columns**

The columns are stored in `df.columns`

Let's capitalize every first letter


In [ ]:
# create headers list
columns = list(df.columns)
columns = [col.capitalize() for col in columns]
print("Columns", columns, sep="\n")

We replace headers and recheck our dataframe:


In [ ]:
df.columns = columns
df.head(10)

Also we need to change type of "**Ts**" column to datetime and set as index


In [ ]:
df["Ts"] = pd.to_datetime(df["Ts"])
df = df.set_index("Ts")
df

<!-- We need to replace the "?" symbol with NaN so the dropna() can remove the missing values:
 -->


# **2. Calculation of technical financial indicators**


Now, we have successfully read the raw dataset and modified columns.
Let's set precision


In [ ]:
pd.set_option("display.precision", 4)
pd.options.display.float_format = '{:.4f}'.format

Let's define function which will plot price of cryptocurrency and indicator


In [ ]:
def plot_indicator(price: str, indicator: str) -> None:
    """
    Plots `price` and `indicator` together on one chart
    
    Parameters
    ----------
    price: str
        The price to plot
    indicator: str
        The indicator to plot
    """
    fig, ax1 = plt.subplots()

    color = "tab:red"
    ax1.set_xlabel("Time")
    ax1.set_ylabel(indicator, color=color)
    f1 = ax1.plot(df.index, df[indicator], color=color)
    ax1.tick_params(axis="y", labelcolor=color)
    plt.xticks(rotation=45)

    ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

    color = "tab:blue"
    ax2.set_ylabel(price, color=color)  # we already handled the x-label with ax1
    f2 = ax2.plot(df.index, df[price], color=color)
    ax2.tick_params(axis="y", labelcolor=color)
    
    figs = f1 + f2
    labels = [indicator, price]
    plt.legend(figs, labels)
    fig.tight_layout()  # otherwise the right y-label is slightly clipped
    plt.show()

We want to calculate some technical indicators. Link to information below <a href="https://www.investopedia.com/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX030WEN2297-2023-01-01"><b>LINK</b></a>


## **Average True Range (ATR)**


The average true range (ATR) is a technical analysis indicator introduced by market technician J. Welles Wilder Jr. in his book New Concepts in Technical Trading Systems that measures market volatility by decomposing the entire range of an asset price for that period.


### The Average True Range (ATR) Formula 


The formula to calculate ATR for an investment with a previous ATR calculation is : 


<center>
    <h1>$ATR = \frac{\text{Previous}\ ATR(n-1)\ +\ TR}{n}$</h1>
</center>

$\text{where:}$ <br>
$n = \text{Number of periods}$ <br>
$TR = \text{True range}$


If there is not a previous ATR calculated, you must use: 


<center>
    <h1>$(\frac{1}{n})\sum_{i}^{n} TR_{i}$</h1>
</center>

$\text{where:}$ <br>
$TR_{i} = \text{Particular true range, such as first day's TR, then second, then third}$ <br>
$n = \text{Number of periods}$


Let's calculate and plot ATR


In [ ]:
df["ATR"] = talib.ATR(df["High"], df["Low"], df["Close"], 15)
plot_indicator("Avg_price", "ATR")

## **On-Balance Volume (OBV)**


On-balance volume (OBV) is a technical trading momentum indicator that uses volume flow to predict changes in stock price. Joseph Granville first developed the OBV metric in the 1963 book Granville's New Key to Stock Market Profits.


### The Formula for OBV is 


<center>
    <h1>$OBV = OBV_{prev} \; + \; \begin{cases} volume,& \text{if } close > close_{prev}\\\\
    0,              & \text{if } close = close_{prev}\\
    -volume,              & \text{if } close < close_{prev}
\end{cases}$</h1>
</center>

$\text{where:}$ <br>
$OBV = \text{Current on-balance volume level}$ <br>
$OBV_{prev} = \text{Previous on-balance volume level}$ <br>
$volume = \text{Latest trading volume amount}$


Let's calculate and plot OBV


In [ ]:
df["OBV"] = talib.OBV(df["Close"], df["Volume"])
plot_indicator("Avg_price", "OBV")

## **Relative Strength Index (RSI)**


The relative strength index (RSI) is a momentum indicator used in technical analysis. RSI measures the speed and magnitude of a security's recent price changes to evaluate overvalued or undervalued conditions in the price of that security. 


### The RSI uses a two-part calculation that starts with the following formula: 


<center>
    <h1>$RSI_{\text{step one}} = 100 - \left[ \frac{100}{1 + \frac{\text{Average gain}}{\text{Average loss}}} \right]$</h1>
</center>


The average gain or loss used in this calculation is the average percentage gain or loss during a look-back period. The formula uses a positive value for the average loss. Periods with price losses are counted as zero in the calculations of average gain. Periods with price increases are counted as zero in the calculations of average loss.


The standard number of periods used to calculate the initial RSI value is 14. For example.


Once there are 14 periods of data available, the second calculation can be done. Its purpose is to smooth the results so that the RSI only nears 100 or zero in a strongly trending market. 


<center>
    <h1>$RSI_{\text{step two}} = 100 - \left[ \frac{100}{1 + \frac{\text{(Previous Average Gain } \times \text{ 13) + Current Gain}}{\text{(Previous Average Loss } \times \text{ 13) + Current Loss}}} \right]$</h1>
</center>


Let's calculate and plot RSI


In [ ]:
df["RSI"] = talib.RSI(df["Close"], 15)
plot_indicator("Avg_price", "RSI")

## **Chaikin A/D Line (AD)**


The accumulation/distribution line was created by Marc Chaikin to determine the flow of money into or out of a security. It should not be confused with the advance/decline line. While their initials might be the same, these are entirely different indicators, as are their users. The advance/decline line provides insight into market movements and the accumulation/distribution line is of use to traders seeking to measure buy/sell pressure on a security or confirm the strength of a trend.   


### The CLV can be calculated as follows: 


<center>
    <h1>$CLV = \frac{(C\ -\ L)\ -\ (H\ -\ C)}{H\ -\ L}$</h1>
</center>

$\text{where:}$ <br>
$C = \text{closing price}$ <br>
$H = \text{high of the price range}$ <br>
$L = \text{low of the price range}$


The CLV is then multiplied by the corresponding period's volume, and the total will form the A/D line. For a look at the CLV's precursor, the on-balance volume read On-Balance Volume: 


Let's calculate and plot AD


In [ ]:
df["AD"] = talib.AD(df["High"], df["Low"], df["Close"], df["Volume"])
plot_indicator("Avg_price", "AD")

In [ ]:
df

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1><strong>Question #2:</strong></h1>
<p><strong>Plot "Avg_price" and "Volume" using <code>plot_indicator</code> function</strong></p>
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 


<details><summary>Click here for the solution</summary>

```python
plot_indicator("Avg_price", "Volume")
```

</details>


# **3. Basic Insight of Dataset**

After reading data into Pandas dataframe, it is time for us to explore the dataset.<br>

There are several ways to obtain essential insights of the data to help us better understand our dataset.


## **Data Types**

Data has a variety of types.

The main types stored in Pandas dataframes are `object`, `float`, `int`, `bool` and `datetime64`. In order to better learn about each attribute, it is always good for us to know the data type of each column. In Pandas:


In [ ]:
df.dtypes

A series with the data type of each column is returned.


As shown above, it is clear to see that the data type of **"High"** and **"Close"** are `float64`, **"Rec_count"** is `int_64`


In [ ]:
# check the data type of data frame "df" by .dtypes
print(df.dtypes)

## **Describe**
If we would like to get a statistical summary of each column e.g. *count, column mean value, column standard deviation*, etc., we use the describe method:


This method will provide various summary statistics, excluding `NaN` (Not a Number) values.


In [ ]:
df.describe()

This shows the statistical summary of all numeric-typed (int, float) columns.

For example, the attribute **"Open"** has 66861 counts, the mean value of this column is 0.8646, the standard deviation is 0.0551, the minimum value is 0.7606, 25th percentile is 0.8083, 50th percentile is 0.8697, 75th percentile is 0.9105, and the maximum value is 1.0661 <br>


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1><strong>Question #3:</strong></h1>

<p>
<strong>You can select the columns of a dataframe by indicating the name of each column. For example, you can select the three columns as follows:</strong>
</p>
<p>
    <strong><code>dataframe[["column 1", "column 2", "column 3"]]</code></strong>
</p>
<p>
<strong>Where "<strong>column</strong>" is the name of the column, you can apply the method  <code>.describe()</code> to get the statistics of those columns as follows:</strong>
</p>
<p>
    <strong><code>dataframe[["column 1", "column 2", "column 3"]].describe()</code></strong>
</p>

<strong>Apply the method <code>.describe()</code> to the columns "<strong>Low"</strong> and "<strong>High"</strong></strong>

</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 

<details><summary>Click here for the solution</summary>

```python
df[["Low", "High"]].describe()
```

</details>


## **Info**
Another method you can use to check your dataset is:


It provides a concise summary of your DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.


In [ ]:
# look at the info of "df"
df.info()

## **Save Dataset**

Correspondingly, Pandas enables us to save the dataset to csv. By using the `dataframe.to_csv()` method, you can add the file path and name along with quotation marks in the brackets.


In [ ]:
df.to_csv("MATICBUSD_trades_1m_preprocessed.csv")

We can also read and save other file formats. We can use similar functions like `pd.read_csv()` and `df.to_csv()` for other data formats. The functions are listed in the following table:


## **Read/Save Other Data Formats**

| Data Formate |        Read       |            Save |
| ------------ | :---------------: | --------------: |
| csv          |  `pd.read_csv()`  |   `df.to_csv()` |
| json         |  `pd.read_json()` |  `df.to_json()` |
| excel        | `pd.read_excel()` | `df.to_excel()` |
| hdf          |  `pd.read_hdf()`  |   `df.to_hdf()` |
| sql          |  `pd.read_sql()`  |   `df.to_sql()` |
| ...          |        ...        |             ... |


# **4. Sources:**

<ul>
    <li><a href="https://www.investopedia.com/terms/a/atr.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX030WEN2297-2023-01-01">https://www.investopedia.com/terms/a/atr.asp</a></li>
    <li><a href="https://www.investopedia.com/terms/o/onbalancevolume.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX030WEN2297-2023-01-01">https://www.investopedia.com/terms/o/onbalancevolume.asp</a></li>
    <li><a href="https://www.investopedia.com/terms/r/rsi.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX030WEN2297-2023-01-01">https://www.investopedia.com/terms/r/rsi.asp</a></li>
    <li><a href="https://www.investopedia.com/terms/c/chaikinoscillator.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX030WEN2297-2023-01-01">https://www.investopedia.com/terms/c/chaikinoscillator.asp</a></li>
</ul>


# Excellent! You have just completed the Introduction Notebook!


# **Thank you for completing this lab!**

## Author

<a href="https://author.skills.network/instructors/borys_melnychuk?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX030WEN2297-2023-01-01" >Borys Melnychuk</a>

<a href="https://author.skills.network/instructors/yaroslav_vyklyuk_2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Yaroslav Vyklyuk, DrSc, PhD</a>

<a href="https://author.skills.network/instructors/mariya_fleychuk?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Mariya Fleychuk, DrSc, PhD</a>



## Change Log

| Date (YYYY-MM-DD) | Version | Changed By      | Change Description                                         |
| ----------------- | ------- | ----------------| ---------------------------------------------------------- |
|     2023-02-25    |   1.0   | Borys Melnychuk | Creation of the lab                                        |

<hr>

## <h3 align="center"> © IBM Corporation 2023. All rights reserved. </h3>
