<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="cognitiveclass.ai logo"  />
</center>

# Financial services: Lab 1.Investigation of cryptocurrency exchange rate dynamic (on the example of cryptocurrency pair MATIC/BUSD), сalculation and analysis of technical financial indicators, characterizing the cryptocurrency market (on the example of ADOSC, NATR, TRANGE)

Estimated time needed: **30** minutes

The tasks:
* Download and process statistical time series of cryptocurrency pair MATIC/BUSD, describing the dynamics of the cryptocurrency market;
* Upload statistical data (framework) from the Pandas library;
* Calculate and analyze technical financial indicators for cryptocurrecny indicators analysis (ATR, OBV, ADV, RSI, AD)

## Objectives

After completing this lab you will be able to:

*   Acquire data in various ways
*   Obtain insights from data with Pandas library


## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li>Data Acquisition</li> 
    <li>Use some indicators</li>    
    <li>Basic Insight of Dataset</li>
</ol>

    
</div>
<hr>


# Data Acquisition
<p>
There are various formats for a dataset: .csv, .json, .xlsx  etc. The dataset can be stored in different places, on your local machine or sometimes online.<br>

In this section, you will learn how to load a dataset into our Jupyter Notebook.<br>

In our case, the Trading Dataset is an online source, and it is in a CSV (comma separated value) format. Let's use this dataset as an example to practice data reading.

<ul>
    <li>Data source: <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0UXWEN/labs/MATICBUSD_trades.csv" target="_blank">https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0UXWEN/labs/MATICBUSD_trades.csv</a></li>
    <li>Data type: csv</li>
</ul>

The Pandas Library is a useful tool that enables us to read various datasets into a dataframe; our Jupyter notebook platforms have a built-in <b>Pandas Library</b> so that all we need to do is import Pandas without installing.
</p>


In [ ]:
#install specific version of libraries used in  lab
# ! conda install pandas  -y
# ! conda install numpy -y
! conda install -c conda-forge ta-lib -y

In [ ]:
# import pandas library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import talib as tb
#Further specify the value of the precision parameter equal to 3 to display three decimal signs (instead of 6 as default).
pd.set_option("display.precision", 3)

## Read Data
<p>
We use <code>pandas.read_csv()</code> function to read the csv file. In the brackets, we put the file path along with a quotation mark so that pandas will read the file into a dataframe from that address. The file path can be either an URL or your local file address.<br>

You can also assign the dataset to any variable you create.

</p>


In [ ]:
path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0UXWEN/labs/MATICBUSD_trades.csv"



This dataset was hosted on IBM Cloud object. Click <a href="https://cocl.us/DA101EN_object_storage?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDA0101ENSkillsNetwork20235326-2021-01-01">HERE</a> for free storage.


In [ ]:
# Read the online file by the URL provides above, and assign it to variable "df"
wdf = pd.read_csv(path, index_col=0)
wdf

After reading the dataset, we can use the <code>dataframe.head(n)</code> method to check the top n rows of the dataframe, where n is an integer. Contrary to <code>dataframe.head(n)</code>, <code>dataframe.tail(n)</code> will show you the bottom n rows of the dataframe.


In [ ]:
#create new dataset
df = pd.DataFrame()
#set time as index
wdf['ts'] = pd.to_datetime(wdf['ts'])
wdf.set_index('ts', inplace=True)

In [ ]:
df['open'] = wdf['price'].resample('1min').first()
df['high'] = wdf['price'].resample('1min').max()
df['low'] = wdf['price'].resample('1min').min()
df['close'] = wdf['price'].resample('1min').last()
df['volume'] = wdf['volume'].resample('1min').sum()
df['avg_price'] = wdf['price'].resample('1min').mean()

In [ ]:
# show the first 5 rows using dataframe.head() method
df.index = pd.to_datetime(df.index)
print("The first 5 rows of the dataframe") 
df.head(5)

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question #1: </h1>
<b>Check the bottom 10 rows of data frame "df".</b>
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 


<details><summary>Click here for the solution</summary>

```python
print("The last 10 rows of the dataframe\n")
df.tail(10)
```


### Add Headers
<p>
Take a look at our dataset. Pandas automatically set the header with an integer starting from 0.
</p>

To better describe our data, we can introduce a header.
</p>

<p>
Thus, we have to add headers manually.
</p>

<p>
First, we create a list "headers" that include all column names in order.
Then, we use <code>dataframe.columns = headers</code> to replace the headers with the list we created.
</p>


In [ ]:
# create headers list
headers = ["Open", "High", "Low", "Close", "Volume", "Avg_price"]
print("headers\n", headers)

We replace headers and recheck our dataframe:


In [ ]:
df.columns = headers
df.head(10)

In [ ]:
df.dtypes

We can drop missing values along the column "Open" by uncommenting the following:


In [ ]:
#df=df1.dropna(subset=["Open"], axis=0)
df.head(20)

Now, we have successfully read the raw dataset and added the correct headers into the dataframe.


# Use some indicators

## ADOSC


The Accumulation/Distribution Oscillator is also known as the Chaikin Oscillator after its inventor.

Like other momentum indicators, this indicator is designed to anticipate directional changes in the Accumulation Distribution Line by measuring the momentum behind the movements. A momentum change is the first step to a trend change. Anticipating trend changes in the Accumulation Distribution Line can help chartists anticipate trend changes in the underlying security. The Chaikin Oscillator generates signals with crosses above/below the zero line or with bullish/bearish divergences.

We calculate ADOSC like: 


$$ N= \frac{(Close−Low)−(High−Close)}{(High−Low)}$$

$$M=N * Volume(Period)$$

$$ADL=M(Period−1)+M(Period)$$

$$CO=(3-dayEMAofADL)−(10-dayEMAofADL)$$
where:
$$N = Money flow multiplier$$
$$M = Money flow volume$$
$$ADL = Accumulation distribution line$$
$$CO = Chaikin oscillator$$



In [ ]:
co = tb.ADOSC(df.High, df.Low, df.Close, df.Volume)
df = df.merge(co.rename('ADOSC'), left_index=True, right_index=True)

In [ ]:
from matplotlib import pyplot as plt, dates

fig, ax = plt.subplots()
ax.plot(co, label = 'ADOSC')

ax.xaxis.set_major_formatter(dates.DateFormatter('%y-%m-%d'))
fig.autofmt_xdate(rotation=45)
plt.legend(loc = 'upper right')
ax.set_xlabel('DateTime')
ax.set_ylabel('ADOSC')

plt.show()

## TRANGE

TRANGE (True range) is an indicator which measures the daily range plus any gap from the closing price of the preceding day.

True Range is calculated as the greater of:

<li>High for the period less the Low for the period.</li>
<li>High for the period less the Close for the previous period.</li>
<li>Close for the previous period and the Low for the current period.</li>

In [ ]:
true_range = tb.TRANGE(df.High, df.Low, df.Close)
df = df.merge(true_range.rename('TRANGE'), left_index=True, right_index=True)

In [ ]:
fig, ax = plt.subplots()
ax.plot(true_range, label = 'True range')

ax.xaxis.set_major_formatter(dates.DateFormatter('%y-%m-%d'))
fig.autofmt_xdate(rotation=45)
plt.legend(loc = 'upper right')
ax.set_xlabel('DateTime')
ax.set_ylabel('TRANGE')
plt.show()

## NATR

ATR Normalized is an instrument, which is used in the technical analysis for measuring the volatility level. In contrast to other modern and popular indicators it is not used for identifying the direction of price movement. It is used only for measuring the volatility level, especially the volatility, which is caused by price gaps or slow refreshing of the chart. ATR Normalized is a normalized version of the ATR indicator, which is calculated according to the formula $$NATR = 100*ATR(t) / Close(t)$$

In [ ]:
#calculate ATR
atr = true_range.rolling(10).sum() / 10
#calculate ATR normalized
natr = atr * 100 / df['Close']
df = df.merge(natr.rename('NATR'), left_index=True, right_index=True)

In [ ]:
fig, ax = plt.subplots()
ax.plot(natr, label = 'ATR normalized')

ax.xaxis.set_major_formatter(dates.DateFormatter('%y-%m-%d'))
fig.autofmt_xdate(rotation=45)
plt.legend(loc = 'upper right')
ax.set_xlabel('DateTime')
ax.set_ylabel('NATR')
plt.show()

 <div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question #2: </h1>
<b>Build an ATR to DateTime graph.</b>
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 


<details><summary>Click here for the solution</summary>

```python
fig, ax = plt.subplots()
ax.plot(atr, label = 'ATR')

ax.xaxis.set_major_formatter(dates.DateFormatter('%y-%m-%d'))
fig.autofmt_xdate(rotation=45)
plt.legend(loc = 'upper right')
ax.set_xlabel('DateTime')
ax.set_ylabel('ATR')
plt.show()
```

</details>


## Save Dataset
<p>
Correspondingly, Pandas enables us to save the dataset to csv. By using the <code>dataframe.to_csv()</code> method, you can add the file path and name along with quotation marks in the brackets.
</p>
<p>
For example, if you would save the dataframe <b>df</b> as <b>MATICBUSD_trades_1m.csv</b> to your local machine, you may use the syntax below, where <code>index = False</code> means the row names will not be written.
</p>




In [ ]:
df.tail()

In [ ]:
df.reset_index()

In [ ]:
df.to_csv("MATICBUSD_trades_1m.csv")

We can also read and save other file formats. We can use similar functions like **`pd.read_csv()`** and **`df.to_csv()`** for other data formats. The functions are listed in the following table:


## Read/Save Other Data Formats

| Data Formate |        Read       |            Save |
| ------------ | :---------------: | --------------: |
| csv          |  `pd.read_csv()`  |   `df.to_csv()` |
| json         |  `pd.read_json()` |  `df.to_json()` |
| excel        | `pd.read_excel()` | `df.to_excel()` |
| hdf          |  `pd.read_hdf()`  |   `df.to_hdf()` |
| sql          |  `pd.read_sql()`  |   `df.to_sql()` |
| ...          |        ...        |             ... |


# Basic Insight of Dataset
<p>
After reading data into Pandas dataframe, it is time for us to explore the dataset.<br>

There are several ways to obtain essential insights of the data to help us better understand our dataset.

</p>


## Data Types
<p>
Data has a variety of types.<br>

The main types stored in Pandas dataframes are <b>object</b>, <b>float</b>, <b>int</b>, <b>bool</b> and <b>datetime64</b>. In order to better learn about each attribute, it is always good for us to know the data type of each column. In Pandas:

</p>


In [ ]:
df.dtypes

<p>
As shown above, it is clear to see that the data type of "Open" and "Close" are <code>float64</code>, "Ts" is <code>object</code>, and "Rec_count" is <code>int64</code>, etc.
</p>
<p>
These data types can be changed; we will learn how to accomplish this in a later module.
</p>


## Describe
If we would like to get a statistical summary of each column e.g. count, column mean value, column standard deviation, etc., we use the describe method:


This method will provide various summary statistics, excluding <code>NaN</code> (Not a Number) values.


In [ ]:
df.describe()

<p>
This shows the statistical summary of all numeric-typed (int, float) columns.<br>

For example, the attribute "Open" has 66861 counts, the mean value of this column is 0.865, the standard deviation is 0.055, the minimum value is 0.761, 25th percentile is 0.808, 50th percentile is 0.870, 75th percentile is 0.910, and the maximum value is 1.066. <br>

However, what if we would also like to check all the columns including those that are of type object? <br><br>

You can add an argument <code>include = "all"</code> inside the bracket. Let's try it again.

</p>


In [ ]:
# describe all the columns in "df" 
df.describe(include = "all", datetime_is_numeric = True)

<p>
Now it provides the statistical summary of all the columns, including object-typed attributes.<br>

We can now see how many unique values there, which one is the top value and the frequency of top value in the object-typed columns.<br>

Some values in the table above show as "NaN". This is because those numbers are not available regarding a particular column type.<br>

</p>


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question #3: </h1>

<p>
You can select the columns of a dataframe by indicating the name of each column. For example, you can select the three columns as follows:
</p>
<p>
    <code>dataframe[[' column 1 ',column 2', 'column 3']]</code>
</p>
<p>
Where "column" is the name of the column, you can apply the method  ".describe()" to get the statistics of those columns as follows:
</p>
<p>
    <code>dataframe[[' column 1 ',column 2', 'column 3'] ].describe()</code>
</p>

Apply the  method to ".describe()" to the columns 'length' and 'compression-ratio'.

</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 


<details><summary>Click here for the solution</summary>

```python
df[['Open', 'Close']].describe()
```

</details>


## Info 
Another method you can use to check your dataset is:


It provides a concise summary of your DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.


In [ ]:
# look at the info of "df"
df.info()

# Excellent! You have just completed the Introduction Notebook!


# **Thank you for completing Lab 1!**

## Authors

<a href="https://author.skills.network/instructors/oleh_lozovyi?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Oleh Lozovyi</a>

<a href="https://author.skills.network/instructors/yaroslav_vyklyuk_2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Yaroslav Vyklyuk, DrSc, PhD</a>

<a href="https://author.skills.network/instructors/mariya_fleychuk?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Mariya Fleychuk, DrSc, PhD</a>

<a href="https://www.linkedin.com/in/joseph-s-50398b136/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDA0101ENSkillsNetwork20235326-2021-01-01">Joseph Santarcangelo</a>


<a href="https://www.linkedin.com/in/fiorellawever/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDA0101ENSkillsNetwork20235326-2021-01-01">Fiorella Wenver</a>

<a href="https:// https://www.linkedin.com/in/yi-leng-yao-84451275/ " target="_blank" >Yi Yao</a>


## Change Log

| Date (YYYY-MM-DD) | Version | Changed By   | Change Description                                         |
| ----------------- | ------- | -------------| ---------------------------------------------------------- |
|     2023-03-01    |   1.0   | Oleh Lozovyi | Lab created                                                |

<hr>

## <h3 align="center"> © IBM Corporation 2023. All rights reserved. <h3/>



