<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX09DGEN/SN_web_lightmode.png?1676849283261" width="300" alt="cognitiveclass.ai logo">
</center>

# *Investigation of BTC/BUSD cryptocurrency using ADOSC, NATR, TRANGE indicators, and other cryptocurrencies.*


## Lab 1. Dataset creation

Estimated time needed: **30** minutes

<div class="alert alert-danger alertdanger">
Тут має дописати Марія
</div>


## Objectives

After completing this lab you will be able to:

*   Acquire data in various ways
*   Obtain insights from data with Pandas library
*   Resample data
*   Calculate Indicators for cryptocurrency market analysis 


<h3>Table of Contents</h3>

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li><u>Data Acquisition</u></li>
        <ul>
            <li><u>Read Data</u></li>
            <li><u>Resample Data</u></li>
        </ul>
    <li><u>Financial Indicators</u></li>
        <ul>
            <li><u>ADOSC</u></li>
            <li><u>NATR</u></li>
            <li><u>TRANGE</u></li>
            <li><u>Save Dataset</u></li>
        </ul>
    <li><u>Basic Insight of Dataset</u></li>
        <ul>
            <li><u>Data Types</u></li>
            <li><u>Describe</u></li>
            <li><u>Info</u></li>
        </ul>
</ol>

</div>
<hr>



***Dataset description***

*Our dataset contains data on individual trades of BTC, rather than aggregated data such as daily prices or volume.*

**Attributes:**
* bs: The buy/sell indicator, which indicates whether a trade was initiated by a buyer or a seller. This may be useful for understanding market sentiment and trends.
* price: The price of BTC at the time of the trade.
* volume: The total number of BTC that were exchanged during a single trade or transaction.



***Note:*** other datasets(that were used in this lab) are similar to this one in terms of attributes

## Data Acquisition

<p>
There are various formats for a dataset: .csv, .json, .xlsx  etc. The dataset can be stored in different places, on your local machine or sometimes online.<br>

In this section, you will learn how to load a dataset into our Jupyter Notebook.<br>

In our case, the Finance Dataset is an online source, and it is in a CSV (comma separated value) format. Let's use this dataset as an example to practice data reading.

<ul>
    <li>Data source: <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IND-GPXX0HOEEN/ADABUSD_trades_1m.csv" target="_blank">https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IND-GPXX0HOEEN/ADABUSD_trades_1m.csv</a></li>
    <li>Data type: csv</li>
</ul>
The Pandas Library is a useful tool that enables us to read various datasets into a dataframe; our Jupyter notebook platforms have a built-in <b>Pandas Library</b> so that all we need to do is import Pandas without installing.
</p>


In [ ]:
# import sys
# !conda install --yes --prefix {sys.prefix} pandas
# !conda install --yes --prefix {sys.prefix} numpy
# !conda install --yes --prefix {sys.prefix} matplotlib
# !conda install --yes --prefix {sys.prefix} scipy
# !conda install --yes --prefix {sys.prefix} seaborn

Let's install <code>talib</code>, which has various methods used to calculate indicators.

In [ ]:
!pip install talib-binary

In [ ]:
!pip install pandas-ta

Execution of the code below may take some time.

Now, let's import libraries that we are going to use


In [ ]:
# import pandas library
import pandas as pd
import numpy as np
# indicators calculation libraries
import talib

### Read Data
<p>
We use <code>pandas.read_csv()</code> function to read the csv file. In the brackets, we put the file path along with a quotation mark so that pandas will read the file into a dataframe from that address. The file path can be either an URL or your local file address.<br>

In our dataset we already have an index column, so let's use <code>index_col=0</code> parameter inside the <code>read_csv()</code> method to use first row from dataset as an index.

You can also assign the dataset to any variable you create.

</p>


In [ ]:
path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX09DGEN/BTCBUSD_trades.csv"

In [ ]:
# Read the online file by the URL provides above, and assign it to variable "df"
# columns_to_use = ['ts', 'bs', 'price', 'volume']
df = pd.read_csv(path, index_col="ts")
# ranming and droping redundant column
df.rename({"Unnamed: 0":"col_to_drop"}, axis="columns", inplace=True)
df.drop(["col_to_drop"], axis=1, inplace=True)
# casting index type to datetime
df.index = pd.to_datetime(df.index)

After reading the dataset, we can use the <code>dataframe.head(n)</code> method to check the top n rows of the dataframe, where n is an integer. Contrary to <code>dataframe.head(n)</code>, <code>dataframe.tail(n)</code> will show you the bottom n rows of the dataframe.


In [ ]:
# show the first 5 rows using dataframe.head() method
print("The first 5 rows of the dataframe")
df.head(5)

Let's capitalize names of the columns


In [ ]:
current_columns = df.columns

capitalized_columns = [v.capitalize() for v in current_columns]
capitalized_columns

And replace old names inside dataframe


In [ ]:
df.columns = capitalized_columns
df

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;">Question #1:</b>

  <b>Check the bottom 10 rows of data frame `df`.</b>
    
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute 


<details><summary>Click here for the solution</summary>

```python
print("The last 10 rows of the dataframe\n")
df.tail(10)
```


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: bold;">Question #2:</b>

  <b>Retrieve the names of the columns in a dataframe.</b>
    
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute 


<details><summary>Click here for the solution</summary>

```python
print(df.columns)
```

</details>


## Financial Indicators
(Indicators calculation(ADOSC, NATR, TRANGE))

These indicators must be calculated on aggregated data; we also need to figure 'open', 'close', 'low', 'high'(ohcl) as they are required for calculation of our indicators. Aggregation can be performed using <code>pandas</code> <code>resample</code> method, while ohlc parameters can be retrieved using <code>ohlc(_method='ohlc')</code>

In [ ]:
resampled_df = df['Price'].resample('1Min').ohlc(_method='ohlc')
resampled_df.head()

Now, let's calculate other values.

In [ ]:
resampled_df['rec_count'] = df['Volume'].resample('1Min').count()
resampled_df['volume'] = df['Volume'].resample('1Min').sum()
resampled_df['avg_price'] = df['Price'].resample('1Min').mean()
resampled_df.head()

Now that we know how to aggregate data in 1-minute intervals, let's implement a method for different time intervals.

In [ ]:
def resample_dataframe(df, period='10Min'):
  res = df.copy()
  res = res.resample(period).agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'rec_count': 'sum',
    'volume': 'sum'
  })
  return res

resampled_15_min_df = resample_dataframe(resampled_df, period='15Min')
resampled_15_min_df.head()

We are all setup, the only thing left is indicators calculation.

### ADOSC - Chaikin A/D Oscillator

<b>The Chaikin Oscillator(ADOSC)</b> is the difference between the 3-day and 10-day EMAs of the Accumulation Distribution Line. Like other momentum indicators, this indicator is designed to anticipate directional changes in the Accumulation Distribution Line by measuring the momentum behind the movements. 

Below you can see the <b>formulas for Chaikin Oscillator</b>:

$1.Money Flow Multiplier = [(Close  -  Low) - (High - Close)] /(High - Low)$

$2. Money Flow Volume = Money Flow Multiplier * Volume for the Period.$

$3. ADL = Previous ADL + Current Period's Money Flow Volume.$

$4. Chaikin Oscillator = (3-day EMA of ADL)  -  (10-day EMA of ADL)$

For more information follow: 
<ul>

<li>
<a href='https://school.stockcharts.com/doku.php?id=technical_indicators:chaikin_oscillator#:~:text=The%20Chaikin%20Oscillator%20is%20the,the%20momentum%20behind%20the%20movements'>https://school.stockcharts.com/doku.php?id=technical_indicators:chaikin_oscillator#:~:text=The%20Chaikin%20Oscillator%20is%20the,the%20momentum%20behind%20the%20movements</a>
</li>

<li>
<a href='https://www.investopedia.com/articles/active-trading/031914/understanding-chaikin-oscillator.asp'>https://www.investopedia.com/articles/active-trading/031914/understanding-chaikin-oscillator.asp</a>
</li>

<li>
<a href='https://www.investopedia.com/terms/c/chaikinoscillator.asp'>https://www.investopedia.com/terms/c/chaikinoscillator.asp</a>
</li>

</ul>

In [ ]:
def ADOSC(df, N1=3, N2=10) -> pd.DataFrame:
    res = talib.ADOSC(df['high'], df['low'],
                      df['close'], df['volume'], N1, N2)
    return pd.DataFrame({'ADOSC': res}, index=df.index) 

adosc_df = ADOSC(resampled_df)
adosc_df.head(25)

Our new dataset has the same index, so assign new column in our current dataframe.

In [ ]:
# creating new column called 'ADOSC' in our current dataframe
resampled_df['ADOSC'] = adosc_df['ADOSC']

In [ ]:
resampled_df.head(25)

### ATR Normalized (NATR)

<b>ATR Normalized</b> is an instrument, which is used in the technical analysis for measuring the volatility level. In contrast to other modern and popular indicators it is not used for identifying the direction of price movement. It is used only for measuring the volatility level, especially the volatility, which is caused by price gaps or slow refreshing of the chart.

<b>ATR</b> Normalized is a normalized version of the ATR indicator, which is calculated according to the formula:

$$100*\frac{ATR(t)}{Close(t)}$$

For more information follow: 
<ul>
<li>
<a href='https://support.atas.net/en/knowledge-bases/2/articles/43436-atr-normalized#:~:text=ATR%20Normalized%20is%20an%20instrument,the%20direction%20of%20price%20movement.'>https://support.atas.net/en/knowledge-bases/2/articles/43436-atr-normalized#:~:text=ATR%20Normalized%20is%20an%20instrument,the%20direction%20of%20price%20movement.</a>
</li>
</ul>

In [ ]:
def TALIB_NATR(df, timeperiod=14) -> pd.DataFrame:
    """ Function for ATR Normalized (NATR) indicator using ```talib``` library.
    """
    res = talib.NATR(df['high'], df['low'], df['close'], timeperiod=timeperiod)
    return pd.DataFrame({'NATR': res}, index=df.index)

def PANDAS_TA_NATR(df, timeperiod=14) -> pd.DataFrame:
    """ Function for ATR Normalized (NATR) indicator using ```pandas_ta``` library.
    """
    res = natr(df['high'], df['low'], df['close'], timeperiod=timeperiod)
    return pd.DataFrame({'NATR': res}, index=df.index)

Let's call <code>TALIB_NATR</code> and <code>PANDAS_TA_NATR</code> methods and review the results they give us.

In [ ]:
talib_natr_df = TALIB_NATR(resampled_df)
talib_natr_df.head(25)

In [ ]:
pandas_ta_natr_df = PANDAS_TA_NATR(resampled_df)
pandas_ta_natr_df.head(25)

The results from above methods can differ, as it is not uncommon to observe different results while using different libraries to calculate technical analysis indicators. Both pandas_ta and talib are widely used libraries for calculating technical indicators, but they may use different methods and algorithms to calculate the same indicator.

Let's define custom function to calculate NATR indicator.

In [ ]:
def NATR(df: pd.DataFrame, period: int=14) -> pd.DataFrame:
    """ Custom function for ATR Normalized (NATR) indicator.
    """
    # calculate values
    high, low, close = df['high'], df['low'], df['close']

    high_low = high - low
    high_close = np.abs(high - close.shift())
    low_close = np.abs(low - close.shift())

    # calculate True Range
    ranges = pd.concat([high_low, high_close, low_close], axis=1)
    true_range = np.max(ranges, axis=1)

    # calculate previous ATR
    atr_prev = true_range.rolling(period).sum() / period

    # calculate current ATR
    atr = (atr_prev*(period - 1) + true_range) / period

    # normalize ATR 
    natr = (100 * atr) / df['close']
    return pd.DataFrame({'NATR': natr})

We will use NATR calculated by our custom function.

In [ ]:
natr_df = NATR(resampled_df)
natr_df.head(25)

In [ ]:
resampled_df['NATR'] = natr_df['NATR']

In [ ]:
resampled_df.head(25)

### True Range (TRANGE)

<b>True Range</b> is a technical indicator. It is calculated as the maximum among the following values:

$$max(High(t) - Low(t), |High(t) - Close(t-1)|,|Low(t) - Close(t-1)|)$$

For more information follow:
<ul>
 <li>
 <a href='https://www.linnsoft.com/techind/true-range-tr'>https://www.linnsoft.com/techind/true-range-tr</a>
</li>

<li>
  <a href='https://support.atas.net/en/knowledge-bases/2/articles/45183-true-range'>https://support.atas.net/en/knowledge-bases/2/articles/45183-true-range</a>
</li>


In [ ]:
def TRANGE(df) -> pd.DataFrame:
    res = talib.TRANGE(df['high'], df['low'], df['close'])
    return pd.DataFrame({'TRANGE': res}, index=df.index) 

trange_df = TRANGE(resampled_df)
trange_df.head(25)

In [ ]:
resampled_df['TRANGE'] = trange_df['TRANGE']

In [ ]:
resampled_df.head(25)

Later, we will use other cryptocurrencies to determine if and how they affect our cryptocurrency. Let's add new columns to our dataframe.

In [ ]:
# creating path variables to easily retrieve data
ape_path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX09DGEN/APEBUSD_trades_1m.csv'
bnb_path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX09DGEN/BNBBUSD_trades_1m.csv'
doge_path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX09DGEN/DOGEBUSD_trades_1m.csv'
eth_path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX09DGEN/ETHBUSD_trades_1m.csv'
xrp_path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX09DGEN/XRPBUSD_trades_1m.csv'
matic_path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX09DGEN/MATICBUSD_trades_1m.csv'

In [ ]:
paths = [('ape', ape_path), ('bnb', bnb_path), ('doge', doge_path), ('eth', eth_path), ('xrp', xrp_path), ('matic', matic_path)]

Our other datasets might have different 'ts', so we need to join(inner) them using <code>pandas</code> <code>merge</code> method.

In [ ]:
merged_df = resampled_df.copy()
for name, path in paths:
    # Read the online file by the URL provides above, and assign it to variable "df"
    # we only need 'avg_price' and 'ts' fields, so we specify it in usecols
    current_currency_df = pd.read_csv(path, index_col='ts', usecols=['ts', 'avg_price'])
    current_currency_df.index = pd.to_datetime(current_currency_df.index)
    # renaming 'avg_price'
    current_currency_df.rename({'avg_price':f'{name}_avg_price'}, axis='columns', inplace=True)
    merged_df = merged_df.merge(current_currency_df, on='ts')
    
merged_df.head(15)

In [ ]:
resampled_df = merged_df.copy()

### Save Dataset
<p>
Correspondingly, Pandas enables us to save the dataset to csv. By using the <code>dataframe.to_csv()</code> method, you can add the file path and name along with quotation marks in the brackets.
</p>
<p>
For example, if you would save the dataframe <b>df</b> as <b>file_name.csv</b> to your local machine, you may use the syntax below, where <code>index = False</code> means the row names will not be written.
</p>


In [ ]:
resampled_df.to_csv("BTCBUSD_resampled_1min.csv", index=True)

We can also read and save other file formats. We can use similar functions like **`pd.read_csv()`** and **`df.to_csv()`** for other data formats. The functions are listed in the following table:


<h2>Read/Save Other Data Formats</h2>

| Data Formate |        Read       |            Save |
| ------------ | :---------------: | --------------: |
| csv          |  `pd.read_csv()`  |   `df.to_csv()` |
| json         |  `pd.read_json()` |  `df.to_json()` |
| excel        | `pd.read_excel()` | `df.to_excel()` |
| hdf          |  `pd.read_hdf()`  |   `df.to_hdf()` |
| sql          |  `pd.read_sql()`  |   `df.to_sql()` |
| ...          |        ...        |             ... |


## Basic Insight of Dataset
<p>
After reading data into Pandas dataframe, it is time for us to explore the dataset.<br>

There are several ways to obtain essential insights of the data to help us better understand our dataset.

</p>


### Data Types
<p>
Data has a variety of types.<br>

The main types stored in Pandas dataframes are <b>object</b>, <b>float</b>, <b>int</b>, <b>bool</b> and <b>datetime64</b>. In order to better learn about each attribute, it is always good for us to know the data type of each column. In Pandas:

</p>


In [ ]:
resampled_df.dtypes

A series with the data type of each column is returned.


In [ ]:
# check the data type of data frame "df" by .dtypes
print(resampled_df.dtypes)

### Describe
If we would like to get a statistical summary of each column e.g. count, column mean value, column standard deviation, etc., we use the describe method:


<code>dataframe.describe()</code>


This method will provide various summary statistics, excluding <code>NaN</code> (Not a Number) values.


In [ ]:
resampled_df.describe()

<p>
This shows the statistical summary of all numeric-typed (int, float) columns.<br>

For example, the attribute 'Rec_count' has 65303 counts, the mean value of this column is 9.28, the standard deviation is 9.53, the minimum value is 1, 25th percentile is 4, 50th percentile is 6, 75th percentile is 12, and the maximum value is 114. <br>

However, what if we would also like to check all the columns including those that are of type object, datetime or any other non numeric-type? <br><br>

You can add an argument <code>include = "all"</code> inside the bracket. Let's try it again.

</p>


In [ ]:
# describe all the columns in "df" 
resampled_df.describe(include = "all")

<p>
Now it provides the statistical summary of all the columns, including attributes that are not numeric.<br>

We can now see how many unique values there, which one is the top value and the frequency of top value in the object-typed columns.<br>

Some values in the table above show as "NaN". This is because those numbers are not available regarding a particular column type.<br>

</p>


Note: you can change the precision for values using <code>pd.set_option("display.precision", 2)</code> to display only 2 numbers after the <code>.</code>


In [ ]:
precision = 2
pd.set_option("display.precision", precision)

Now let's run <code>describe()</code> method one more time to make sure the precision was changed 


In [ ]:
resampled_df.describe()

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question #3: </h1>

<p>
You can select the columns of a dataframe by indicating the name of each column. For example, you can select the three columns as follows:
</p>
<p>
    <code>dataframe[['column 1 ',column 2', 'column 3']]</code>
</p>
<p>
Where "column" is the name of the column, you can apply the method  ".describe()" to get the statistics of those columns as follows:
</p>
<p>
    <code>dataframe[['column 1 ',column 2', 'column 3'] ].describe()</code>
</p>

Apply the  method to ".describe()" to the columns 'low' and 'high'.

</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 


<details><summary>Click here for the solution</summary>

```python
resampled_df[['low', 'high']].describe()
```

</details>


### Info
Another method you can use to check your dataset is:


<code>dataframe.info()</code>


It provides a concise summary of your DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.


In [ ]:
# look at the info of "df"
resampled_df.info()

## Excellent! You have just completed the  Introduction Notebook!


# **Thank you for completing Lab 1!**

## Authors

<a href="https://author.skills.network/instructors/nazar_kohut">Nazar Kohut</a>

<a href="https://author.skills.network/instructors/yaroslav_vyklyuk_2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Yaroslav Vyklyuk, DrSc, PhD</a>

<a href="https://author.skills.network/instructors/mariya_fleychuk?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Mariya Fleychuk, DrSc, PhD</a>


## Change Log

| Date (YYYY-MM-DD) | Version | Changed By   | Change Description                                         |
| ----------------- | ------- | -------------| ---------------------------------------------------------- |
|     2023-02-25    |   1.0   | Nazar Kohut  | Lab created                                                |

<hr>

## <h3 align="center"> © IBM Corporation 2023. All rights reserved. <h3/>