<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="400" alt="cognitiveclass.ai logo"  />
</center>

# Investigation of cryptocurrency exchange rate dynamic (Matic/USD), сalculation and analysis of technical financial indicators, characterizing the cryptocurrency market (ATR, OBV, RSI, AD)

Estimated time needed: **30** minutes

The tasks:
* Download and process statistical time series of cryptocurrency pair Matic/USD, describing the dynamics of the cryptocurrency market;
* Upload statistical data from the Pandas library;
* Calculate and analyze technical financial indicators for cryptocurrecny indicators analysis (ATR, OBV, ADV, RSI, AD)


## Objectives

After completing this lab you will be able to:

*  Acquire data in various ways;
*  Obtain insights from data with Pandas library;
*  Calculating technical financial indicators.

### **Columns**

* #### `Ts` - the timestamp of the record
* #### `Open` -  the price of the asset at the beginning of the trading period
* #### `High` -  the highest price of the asset during the trading period
* #### `Low` - the lowest price of the asset during the trading period.
* #### `Close` - the price of the asset at the end of the trading period
* #### `Volume` - the total number of shares or contracts of a particular asset that are traded during a given period
* #### `Rec_count` -  the number of individual trades or transactions that have been executed during a given time period
* #### `Avg_price` - the average price at which a particular asset has been bought or sold during a given period
* #### `ATR` - average true range indicator
* #### `OBV` - on-balance volume indicator
* #### `RSI` - relative strength index indicator
* #### `AD` - accumulation / distribution indicator
* #### `BTC_price` - the avarage price from BTC/BUSD dataset 
* #### `BNB_price` - the avarage price from BNB/BUSD dataset

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li><a href="https://#data_acquisition">Data Acquisition</a>
    <li><a href="https://#basic_insight">Basic Insight of Dataset</a></li>
    <li><a href="https://#indicators">Calculating technical financial (cryptocurrency) indicators</a> <ul> <li>ATR <li>OBV <li>RSI <li>AD </ul>
</ol>

</div>
<hr>


# Data Acquisition
<p>
There are various formats for a dataset: .csv, .json, .xlsx  etc. The dataset can be stored in different places, on your local machine or sometimes online.<br>

In this section, you will learn how to load a dataset into our Jupyter Notebook.<br>

In our case, the Cryptocurrency Dataset is an online source, and it is in a CSV (comma separated value) format. Let's use this dataset as an example to practice data reading.

<ul>
    <li>Data source: <a href="https://1824251045.rsc.cdn77.org/web/algohouse/data/BTCBUSD_trades_1m.csv" target="_blank">https://1824251045.rsc.cdn77.org/web/algohouse/data/BTCBUSD_trades_1m.csv</a>
    <a href="https://1824251045.rsc.cdn77.org/web/algohouse/data/MATICBUSD_trades_1m.csv" target="_blank">https://1824251045.rsc.cdn77.org/web/algohouse/data/MATICBUSD_trades_1m.csv</a>
    <a href="https://1824251045.rsc.cdn77.org/web/algohouse/data/BNBBUSD_trades_1m.csv" target="_blank">https://1824251045.rsc.cdn77.org/web/algohouse/data/BNBBUSD_trades_1m.csv</a></li>
    <li>Data type: csv</li>
</ul>
The Pandas Library is a useful tool that enables us to read various datasets into a dataframe; our Jupyter notebook platforms have a built-in <b>Pandas Library</b> so that all we need to do is import Pandas without installing.
</p>


In [ ]:
#install specific version of libraries used in  lab
#! mamba install pandas -y
#! mamba install numpy -y

In [ ]:
# import pandas library
import pandas as pd
import numpy as np
pd.set_option("display.precision", 2)
pd.options.display.float_format = '{:.2f}'.format

## Read Data
<p>
We use <code>pandas.read_csv()</code> function to read the csv file. In the brackets, we put the file path along with a quotation mark so that pandas will read the file into a dataframe from that address. The file path can be either an URL or your local file address.<br>

Because the data does not include headers, we can add an argument <code>headers = None</code> inside the <code>read_csv()</code> method so that pandas will not automatically set the first row as a header.<br>

You can also assign the dataset to any variable you create.

</p>


This dataset was hosted on IBM Cloud object. Click <a href="https://cocl.us/DA101EN_object_storage?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDA0101ENSkillsNetwork20235326-2021-01-01">HERE</a> for free storage.


In this lab, we will be using three different datasets to create one that we will use in future labs. We will load three datasets and add the 'avg_price' columns from two of them to our main one.


In [ ]:
path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0XHOEN/MATICBUSD_trades_1m.csv"
df = pd.read_csv(path,low_memory=False, index_col=0)

# Download additional datasets with "price" columns
btc_url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0XHOEN/BTCBUSD_trades_1m.csv'
btc_df = pd.read_csv(btc_url,low_memory=False, index_col=0)


bnb_url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0XHOEN/BNBBUSD_trades_1m%20(4).csv'
bnb_df = pd.read_csv(bnb_url,low_memory=False, index_col=0)



After reading the dataset, we can use the <code>dataframe.head(n)</code> method to check the top n rows of the dataframe, where n is an integer. Contrary to <code>dataframe.head(n)</code>, <code>dataframe.tail(n)</code> will show you the bottom n rows of the dataframe.


In [ ]:
# show the first 5 rows using dataframe.head() method
print("The first 5 rows of the dataframe") 
df.head(5)

We have to rename the columns "avg_price" so that they don't repeat in the main dataset using dataframe.rename()

In [ ]:
btc_df = btc_df.rename(columns = {"avg_price" : "BTC_price"})
btc_df.head()

In [ ]:
bnb_df = bnb_df.rename(columns = {"avg_price" : "BNB_price"})
bnb_df.head()

Now we add "avg_price" from BTC and BNB datasets to our main one. We can use pandas.concat()

In [ ]:
df = pd.concat([df, btc_df["BTC_price"],bnb_df["BNB_price"]], axis =1)
df.head()

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<b style="font-size: 2em; font-weight: bold;"> Question #1: </b><br>
<b>Check the bottom 10 rows of data frame "df".</b>
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 
print("The last 10 rows of the dataframe\n")
df.tail(5)

Here we can see a lot of NaN values, which appear due to different dimensions of the datasets.

<details><summary>Click here for the solution</summary>

```python
print("The last 10 rows of the dataframe\n")
df.tail(10)
```


### Add Headers
<p>
Take a look at our dataset. Pandas automatically set the header with an integer starting from 0.
</p>

<p>
Thus, we have to add headers manually.
</p>
<p>
First, we create a list "headers" that include all column names in order.
Then, we use <code>dataframe.columns = headers</code> to replace the headers with the list we created.
</p>


In [ ]:
# create headers list
headers = ["Ts","Open","High","Low","Close","Volume","Rec_count","Avg_price","BTC_price","BNB_price"]
print("headers\n", headers)

We replace headers and recheck our dataframe:


In [ ]:
df.columns = headers
df.head(10)

We can remove the missing values using dropna() 


We can drop missing values along the column "open" as follows:


In [ ]:
df=df.dropna(subset=["Open"], axis=0)
df.head(5)

Now, we have successfully read the raw dataset and added the correct headers into the dataframe.


 <div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question #2: </h1>
<b>Find the name of the columns of the dataframe.</b>
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 
print(df.columns)

<details><summary>Click here for the solution</summary>

```python
print(df.columns)
```

</details>


## Save Dataset
<p>
Correspondingly, Pandas enables us to save the dataset to csv. By using the <code>dataframe.to_csv()</code> method, you can add the file path and name along with quotation marks in the brackets.
</p>
<p>
For example, if you would save the dataframe <b>df</b> as <b>bnb.csv</b> to your local machine, you may use the syntax below, where <code>index = False</code> means the row names will not be written.
</p>


We can also read and save other file formats. We can use similar functions like **`pd.read_csv()`** and **`df.to_csv()`** for other data formats. The functions are listed in the following table:


## Read/Save Other Data Formats

| Data Formate |        Read       |            Save |
| ------------ | :---------------: | --------------: |
| csv          |  `pd.read_csv()`  |   `df.to_csv()` |
| json         |  `pd.read_json()` |  `df.to_json()` |
| excel        | `pd.read_excel()` | `df.to_excel()` |
| hdf          |  `pd.read_hdf()`  |   `df.to_hdf()` |
| sql          |  `pd.read_sql()`  |   `df.to_sql()` |
| ...          |        ...        |             ... |


<h1 id="basic_insight">Basic Insight of Dataset</h1>
<p>
After reading data into Pandas dataframe, it is time for us to explore the dataset.<br>

There are several ways to obtain essential insights of the data to help us better understand our dataset.

</p>


<h2>Data Types</h2>
<p>
Data has a variety of types.<br>

The main types stored in Pandas dataframes are <b>object</b>, <b>float</b>, <b>int</b>, <b>bool</b> and <b>datetime64</b>. In order to better learn about each attribute, it is always good for us to know the data type of each column. In Pandas:

</p>


In [ ]:
# check the data type of data frame "df" by .dtypes
df.dtypes
#If it doesnt work use, you can use  "print()" function


A series with the data type of each column is returned.


<p>
As shown above, it is clear to see that the data type of "open" and "high" is <code>float64</code>, "ts" is object <code>object</code>, etc.
</p>
<p>
These data types can be changed; we will learn how to accomplish this in a later module.
</p>


<h2>Describe</h2>
If we would like to get a statistical summary of each column e.g. count, column mean value, column standard deviation, etc., we use the describe method:


This method will provide various summary statistics, excluding <code>NaN</code> (Not a Number) values.


In [ ]:
df.describe()

<p>
This shows the statistical summary of all numeric-typed (int, float) columns.<br>

For example, the attribute "id" has 67212 counts, the mean value of this column is 33605.5, the standard deviation is 19402.6, the minimum value is 0, 25th percentile is 16802, 50th percentile is 33605, 75th percentile is 50408, and the maximum value is 67211. <br>

However, what if we would also like to check all the columns including those that are of type object? <br><br>

You can add an argument <code>include = "all"</code> inside the bracket. Let's try it again.

</p>


In [ ]:
# describe all the columns in "df" 
df.describe(include = "all")

<p>
Now it provides the statistical summary of all the columns, including object-typed attributes.<br>

We can now see how many unique values there, which one is the top value and the frequency of top value in the object-typed columns.<br>

Some values in the table above show as "NaN". This is because those numbers are not available regarding a particular column type.<br>

</p>


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question #3: </h1>

<p>
You can select the columns of a dataframe by indicating the name of each column. For example, you can select the three columns as follows:
</p>
<p>
    <code>dataframe[[' column 1 ',column 2', 'column 3']]</code>
</p>
<p>
Where "column" is the name of the column, you can apply the method  ".describe()" to get the statistics of those columns as follows:
</p>
<p>
    <code>dataframe[[' column 1 ',column 2', 'column 3'] ].describe()</code>
</p>

Apply the  method to ".describe()" to the columns 'length' and 'compression-ratio'.

</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 
df[['Ts', 'Volume']].describe()

<details><summary>Click here for the solution</summary>

```python
df[['ts', 'volume']].describe()
```

</details>


<h2>Info</h2>
Another method you can use to check your dataset is:


It provides a concise summary of your DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.


In [ ]:
# look at the info of "df"
df.info()

<h2>Calculating indicators: ATR, OBV, ADV, RSI, AD indicators calculating</h1>


<h3>ATR: Avarage true range</h3>
<h4>What Is the Average True Range (ATR)?</h4>
The average true range (ATR) is a technical analysis indicator introduced by market technician J. Welles Wilder Jr. in his book New Concepts in Technical Trading Systems that measures market volatility by decomposing the entire range of an asset price for that period

1. Calculate $True\ Range\ (TR)$:
$$
TR = \textrm{max}[ \ ( \ High\ - Low\ ),\ | \ High\ - Close_p \ |,\ |\ Low\ - Close_p \ | \ ],
$$
<center>where $p$ — previous<center>

2. Calculate $ATR$:
$$
ATR = \frac{1}{n} \sum \limits _{i=1} ^{n} TR_i,
$$ 


In [ ]:

def atr(df, n):
    # Create a new DataFrame to hold the True Range (TR) values
    tr = pd.DataFrame()

    # Calculate the three components of the True Range for each period
    tr['h-l'] = df['High'] - df['Low']  # High - Low
    tr['h-pc'] = abs(df['High'] - df['Close'].shift())  # absolute value (High - Previous Close)
    tr['l-pc'] = abs(df['Low'] - df['Close'].shift())  # absolute value (Low - Previous Close)
    # Take the maximum of the three components for each period to get the True Range (TR)
    tr['max'] = tr.max(axis=1)
    # Calculate the rolling mean of the True Range (TR) over n periods to get the Average True Range (ATR)
    atr = tr['max'].rolling(n).mean()
    # Return the Average True Range (ATR) as a pandas Series
    return atr
df["ATR"] = atr(df, 14)
df.tail()

<h3>OBV: On-balance volume</h3>
<h4>What Is On-Balance Volume (OBV)?</h4>
On-balance volume (OBV) is a technical trading momentum indicator that uses volume flow to predict changes in stock price. Joseph Granville first developed the OBV metric in the 1963 book Granville's New Key to Stock Market Profits.


$$
OBV = OBVp + \left\{
    \begin{array}\\
        volume & \mbox{if } \  CLOSE > CLOSEp \\
        0 & \mbox{if } \   CLOSE = CLOSEp\\
        -volume &\mbox{if } \   CLOSE < CLOSEp\\
    \end{array}
\right.
$$



In [ ]:
# Define a function to calculate OBV values
def calculate_obv(data):
    obv = []  # Initialize the OBV list
    obv.append(0)  # The first OBV value is always 0
    for i in range(1, len(data)):
        # If the current closing price is higher than the previous closing price,
        # add the current volume to the previous OBV value
        if data['Close'][i] > data['Close'][i-1]:
            obv.append(obv[-1] + data['Volume'][i])
        # If the current closing price is lower than the previous closing price,
        # subtract the current volume from the previous OBV value
        elif data['Close'][i] < data['Close'][i-1]:
            obv.append(obv[-1] - data['Volume'][i])
        # If the current closing price is equal to the previous closing price,
        # use the previous OBV value
        else:
            obv.append(obv[-1])
    return obv
# Calculate the OBV values and add a new column to the DataFrame
df['OBV'] = calculate_obv(df)
df['OBV'].head()

<h3>ADV:Average Daily Trading Volume</h3>
Average daily trading volume (ADTV) is the average number of shares traded within a day in a given stock. Daily volume is how many shares are traded each day, but this can be averaged over a number of days to find the average daily volume. Average daily trading volume is an important metric because high or low trading volume attracts different types of traders and investors. Many traders and investors prefer higher average daily trading volume compared to low trading volume, because with high volume it is easier to get into and out positions. Low volume assets have fewer buyers and sellers, and therefore it may be harder to enter or exit at a desired price.




In [ ]:
adv = df["Volume"].mean()
print(f"Avarage Dailu Volume is : {adv}")

<h3>RSI: Relative Strength Index </h3>
<h4>What Is the Relative Strength Index (RSI)?</h4>
The relative strength index (RSI) is a momentum indicator used in technical analysis. RSI measures the speed and magnitude of a security's recent price changes to evaluate overvalued or undervalued conditions in the price of that security.


$$
RSI_{step \ one} = 100 - [ \frac{100}{1 + \frac{Avg \ gain}{Avg \ loss}} ]
$$


In [ ]:

def rsi(df, n=14):
    # Get the closing price data from the DataFrame
    close = df['Close']
    
    # Calculate the price differences between each day
    delta = close.diff()

    # Define the up and down days
    up, down = delta.copy(), delta.copy()
    up[up < 0] = 0  # If the price difference is negative, set it to 0 (up days)
    down[down > 0] = 0  # If the price difference is positive, set it to 0 (down days)

    # Calculate the exponential moving averages of the up and down days
    roll_up = up.ewm(com=n, min_periods=n).mean()
    roll_down = down.abs().ewm(com=n, min_periods=n).mean()

    # Calculate the Relative Strength Index (RSI)
    rs = roll_up / roll_down  # Calculate the relative strength
    rsi = 100.0 - (100.0 / (1.0 + rs))  # Calculate the RSI using the relative strength

    return rsi  # Return the RSI values as a pandas Series

df['RSI'] = rsi(df)
df.tail(10)


<h3>AD : Accumulation/Distribution</h3>
The accumulation/distribution line was created by Marc Chaikin to determine the flow of money into or out of a security.
 It should not be confused with the advance/decline line. While their initials might be the same, these are entirely different indicators, as are their users. The advance/decline line provides insight into market movements and the accumulation/distribution line is of use to traders seeking to measure buy/sell pressure on a security or confirm the strength of a trend.  

<h4>Calculatig MFM (Money Flow Multiplier)</h4>
$$
MFM =  \frac{(Close - Low) - (High - Close)}{High - Low}
$$
<h4>Now use MFM to calculate AD</h4>
$$
AD = Prev. AD + MFM
$$

In [ ]:
#Calculating MFM
df['AD'] = ((df['Close'] - df['Low']) - (df['High'] - df['Close'])) / (df['High'] - df['Low']) * df['Volume'] 
#Calculating AD
df['AD'] +=  df['AD'].shift(1)
df.head(50)

Let's save obtained DataSet to the file:

In [ ]:
df.to_csv("Lab1DataSet.csv", index=False)

<h5>Here is more information about indicators <a href = "https://www.investopedia.com/">investopedia</a></h5>



  

<h1>Excellent! You have just completed the  Introduction Notebook!</h1>


# **Thank you for completing this lab!**

## Author

<a href="https://author.skills.network/instructors/ostap_liashenyk" target="_blank" >Ostap Liashenyk</a>

<a href="https://author.skills.network/instructors/yaroslav_vyklyuk_2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Yaroslav Vyklyuk, DrSc, PhD</a>

<a href="https://author.skills.network/instructors/mariya_fleychuk?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Mariya Fleychuk, DrSc, PhD</a>




## Change Log

| Date (YYYY-MM-DD) | Version | Changed By      | Change Description                                         |
| ----------------- | ------- | ----------------| ---------------------------------------------------------- |
|     2023-04-01    |   1.0   | Ostap Liashenyk | Creation of the lab                                        |

<hr>

## <h3 align="center"> © IBM Corporation 2023. All rights reserved. </h3>