<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="400" alt="cognitiveclass.ai logo">
</center>

<!-- # Financial services: Lab 1. Cryptocurrency Dataset Creation  (on the example of BTC/BUSD), calculation and analysis of technical financial indicators, characterizing the cryptocurrency market (on the example of ADOSC, NATR, TRANGE) -->

# **Investigation relationships between exchange rate BTC/BUSD and ADOSC, NATR, TRANGE indicators**
    
## Lab 1. Dataset Creation
    
Estimated time needed: **30** minutes


### The tasks:
*   Download and process statistical time series of cryptocurrency pair BTC/BUSD, describing the dynamics of the cryptocurrency market;
*   Upload statistical data from the Pandas library;
*   Calculate and analyze technical financial indicators for cryptocurrecy market (on the example of ADOSC, NATR,TRANGE)


### Objectives

After completing this lab you will be able to:

*  Acquire data in various ways
*  Obtain insights from data with Pandas library 
*  Resample data
*  Calculate Indicators for cryptocurrency market analysis 
   


<h3>Table of Contents</h3>

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li>Data Acquisition</li>
        <ul>
            <li>Read Data</li>
            <li>Resample Data</li>
        </ul>
    <li>Financial Indicators</li>
        <ul>
            <li>ADOSC</li>
            <li>NATR</li>
            <li>TRANGE</li>
        </ul>
    <li>Basic Insight of Dataset</li>
        <ul>
            <li>Data Types</li>
            <li>Describe</li>
            <li>Info</li>
            <li>Save Dataset</li>
        </ul>
</ol>

</div>
<hr>


## 1. Data Acquisition
<p>
There are various formats for a dataset: <code>.csv</code>, <code>.json</code>, <code>.xlsx</code>  etc. The dataset can be stored in different places, on your local machine or sometimes online.<br>

In this section, you will learn how to load a dataset into our Jupyter Notebook.<br>

In our case, the dataset is an online source, and it is in a <em><strong>CSV (comma separated value) format</strong></em>. Let's use this dataset as an example to practice data reading.

<ul>
    <li>Data source: <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0QGDEN/BTCBUSD_trades.csv" target="_blank">https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0QGDEN/BTCBUSD_trades.csv</a></li>
    <li>Data type: csv</li>
</ul>

The Pandas Library is a useful tool that enables us to read various datasets into a dataframe. Our Jupyter notebook platforms have a built-in <b>Pandas Library</b> so that all we need to do is import Pandas without installing.
</p>


If you run the lab locally using Anaconda, you can load the correct library and versions by uncommenting the following:


In [ ]:
#install specific version of libraries used in  lab
#! mamba install pandas -y
#! mamba install numpy -y

In [ ]:
# import pandas library
import pandas as pd
import numpy as np

### Read Data
<p>
We use <code>pandas.read_csv()</code> function to read the csv file. In the brackets, we put the file path along with a quotation mark so that pandas will read the file into a dataframe from that address. The file path can be either an URL or your local file address.<br>

You can also assign the dataset to any variable you create.

</p>


In [ ]:
path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0QGDEN/BTCBUSD_trades.csv"

This dataset was hosted on IBM Cloud object. Click <a href="https://cocl.us/DA101EN_object_storage?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDA0101ENSkillsNetwork20235326-2021-01-01">HERE</a> for free storage.


In [ ]:
# Read the online file by the URL provides above, and assign it to variable "df"
df = pd.read_csv(path)

In finance you sometimes need to use different numbers of decimal places. For ease of reading, let's specify the value of the precision parameter equal to 3 to display three decimal signs (instead of 6 as default).


In [ ]:
pd.set_option("display.precision", 3)

After reading the dataset, we can use the <code>dataframe.head(n)</code> method to check the top <em><strong>n</strong></em> rows of the dataframe, where n is an integer. Contrary to <code>dataframe.head(n)</code>, <code>dataframe.tail(n)</code> will show you the bottom <em><strong>n</strong></em> rows of the dataframe.


In [ ]:
# show the first 5 rows using dataframe.head() method
print("The first 5 rows of the dataframe")
df.head(5)

As there is no text (i.e. column title) on the first cell of the CSV file, the resulting data frame's first column is given the name <code>Unnamed:0</code>. We should fix this issue.

By specifying an <code>index_col=0</code> argument to <code>read_csv()</code> function we tell pandas that the first column in the CSV file is the index for the data frame. As follows, the undesired column <code>Unnamed:0</code> will disappear.


In [ ]:
df = pd.read_csv(path, index_col=0)
df.head(5)

The dataset can be quickly processed to a time series analysis if it is indexed by date. Furthermore, it is the correct option when it comes to time series visualization.

We need to set our <strong>'ts'</strong> column representing a date as an index column. We use <code>df.set_index(inplace=True)</code> method to set the dataframe index using existing column. The <code>inplace=True</code> parameter in this function means to modify the dataframe and save the changes.

Last but not least, the set index in our dataframe should be converted do datetime index type. This we accomplish using <code>pd.to_datetime()</code> method. To obtain the current index of our dataframe use <code>df.index</code> method.


In [ ]:
df.set_index('ts', inplace=True)
df.index = pd.to_datetime(df.index)
df.head()

### Resample Data

Since the data in our dataset is not aggregated, we need to convert it to aggregated data for further analysis. The <em>resampling technique</em> will provide a helpful hand in this.

### Resampling

<h4>What is resampling?</h4>

For time series analysis, <strong>resampling</strong> is an essential technique that gives you the freedom to select the required level of data resolution. For example, you can upsample data or add more data points by converting 5-minute data into 1-minute data, and vice versa, downsample it.
<br>
<p>
    The basic syntax for resampling is <code>dataframe.resample('desired resolution')</code> method. Along with that, different aggregation function can be used.
</p>


In our case, the dataset provided has nonaggregated data, such missing the needed **OHLCV parameters** for a given period.

<h4>What is OHLCV?</h4>

**OHLCV** is an aggregated form of market data standing for **Open, High, Low, Close and Volume**. OHLCV data includes 5 data points: the Open and Close represent the first and the last price level during a specified interval. High and Low represent the highest and lowest reached price during that interval. Volume is the total amount traded during that period.
Read more about this topic <a href="https://www.kaiko.com/collections/ohlcv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">here</a>.


We need to transform data from non-aggregated to data within a 1-minute interval.

Considering the semantics of our dataset, for the **Open** parameter, we take <em>the first value</em> of an price interval, while for **Close**, we have <em>the last value</em>. 
For **High** <em>maximum value</em> within an interval is taken, in accordance for **Low** we take <em>minimum price value</em>. 
Column **Volume** will store all <em>summed-up values</em> within an interval. 
The **Price** parameter will represent a <em>mean price value</em> within an interval.  


Let's implement data resampling to 1-minute interval.


In [ ]:
# adding new columns
df['count'] = df['volume']
for column in ['open', 'high', 'close', 'low']:
    df[column] = df['price']

# resampling to 1-minute interval
df = df.loc[:, 'bs':'low'].resample('1min').agg({
    'bs': 'first',
    'price': 'mean',
    'volume': 'sum',
    'count': 'count',
    'open': 'first',
    'high': 'max',
    'close': 'last',
    'low': 'min',
})

df.head()

Notice that now the data in the index column is distributed over an 1-minute interval as well as other parameters.

Great! As a result, we moved to aggregated data.


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
  <b style="font-size: 2em; font-weight: 600;">Question #1:</b>

  <b>Check the bottom 10 rows of data frame "df".</b> 
</div>


In [ ]:
# Write your code below and press Shift+Enter to execute 
print("The last 10 rows of the dataframe\n")
df.tail(10)

<details><summary>Click here for the solution</summary>

```python
print("The last 10 rows of the dataframe\n")
df.tail(10)
```


## 2. Financial Indicators


Cryptocurrencies are traded every day of the week, around-the-clock. This generates a tremendous volume of data, which makes it difficult to know what to watch out for and how to separate the signal from the noise. Together with candlestick charts, indicators give traders tools to streamline data and spot patterns for better trading decisions. Read more <a href="https://www.bcbgroup.com/best-indicators-for-crypto-trading-analysis/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">here</a>.
<h3>What are indicators?</h3>

<strong>Indicators</strong> are statistics used to measure current conditions as well as to forecast financial or economic trends.

In the context of technical analysis, an indicator is a mathematical calculation based on a security's price or volume. The result is used to predict future prices. Read more <a href="https://www.investopedia.com/terms/i/indicator.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">here</a>.


### ADOSC - Chaikin A/D Oscillator


<i>$The\ Chaikin\ advance/decline\ (AD)$</i> is a volume-based indicator to measure the cumulative flow of money into and out of an asset. The indicator assumes that the degree of buying or selling pressure can be determined by the location of the close, relative to the high and low for the period. Read more <a href="https://www.investopedia.com/terms/c/chaikinoscillator.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">here</a>.


$The\ AD\ line$ is a running total of each period's <b>$ money\ flow\ volume\  (MFV)$ </b>. It is calculated as follows: 

1. Compute the <b>$ money\ flow\ multiplier\  MFM$ </b> as the relationship of the close to the high-low range: <br>

$$
MFM = \frac{(Close \ - Low) \ - \ (High \ - Close)}{High \ - \ Low}
$$

2. Multiply the $MFM$ by the period's volume  $Volume$ to come up with the $MFV$: 

$$
MFV =  MFM \times Volume
$$

3. Obtain the $AD\ line$:

$$
AD = AD_p + MFV,
$$
<center>where $p$ — previous</center>


<b>$The\ Chaikin\ A/D\ oscillator\ (ADOSC)$</b> <i>(read more at <a href="https://www.investopedia.com/articles/active-trading/031914/understanding-chaikin-oscillator.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">ADOSC</a></i>) is the 
<a href="https://www.investopedia.com/terms/m/macd.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">MACD indicator</a> that's applied to <b>$the\ Chaikin\ AD\ line$</b>. The Chaikin oscillator intends to predict changes in the AD line.
It is computed as the difference between 3 and 10 <a href="https://www.investopedia.com/terms/e/ema.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">EMA</a> periods of the AD line.

$$
ADOSC\ =\ EMA_{3}\ −\ EMA_{10}
$$


Let's declare the function responsible of calculating the Chaikin Oscillator value and create separate column in our data frame to store the result.


In [ ]:
def ema(s: pd.Series, period: int) -> pd.Series:
    """Return the exponential moving average (EMA).
    """
    return s.ewm(span=period, min_periods=period, adjust=False).mean()


def adosc(df: pd.DataFrame) -> pd.Series:
    """Return the Chaikin Oscillator.
    """
    # calculate money flow multiplier:
    mfm = ((df['close'] - df['low']) - (df['high'] - df['close'])) / (df['high'] - df['low'])
    
    # calculate money flow volume:
    mfv = mfm * df['volume']

    # refine money flow volume:
    mfv = np.where((df['close'] == df['high']) & (df['close'] == df['low']) | (df['high'] == df['low']), 0, mfv)
    mfv = pd.Series(mfv, index=df.index)

    # calculate A/D line:
    ad = mfv.cumsum()

    # Calculate Chaikin Oscillator:
    chaikin = ema(ad, 3) - ema(ad, 10)
    return chaikin

In [ ]:
df['adosc_indicator'] = adosc(df)
df[['adosc_indicator']].head(15)

You may have noticed that the first 9 rows contain `NaN` values. It happens because the Chaikin oscillator is computed as the difference between EMA periods of the AD line. As the longest minimum EMA period we set was the 10-minute interval, the first 9 entries have not needed data to be calculated.

Secondly, we may also observe values less than zero in the obtained data. The Chaikin Oscillator turns positive when the faster 3-minute EMA moves above the slower 10-minute EMA. Conversely, the indicator turns negative when the 3-minute EMA moves below the 10-minute EMA. Read more <a href="https://school.stockcharts.com/doku.php?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01&id=technical_indicators%3Achaikin_oscillator">here</a>.


### ATR Normalized (NATR)


<i>$ATR\ Normalized\ (NATR)$</i> is an instrument, which is used in the technical analysis for measuring the volatility level. In contrast to other modern and popular indicators it is not used for identifying the direction of price movement. It is used only for measuring the volatility level, especially the volatility, which is caused by price gaps or slow refreshing of the chart. 

<a href="https://support.atas.net/en/knowledge-bases/2/articles/43436-atr-normalized?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">ATR Normalized</a> is a normalized version of the <a href="https://www.investopedia.com/terms/a/atr.asp?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">ATR indicator</a> and is calculated according to the following formula:
<br>
<br>
$$ 
NATR =  \frac{{100}\times ATR}{Close} \\\\\\\\
\\\\\\
$$
To calculate ATR we need to perform the following steps:

1. Calculate $True\ Range\ (TR)$:
$$
TR = \textrm{max}[(High\ - \ Low),\ |High\ - \ Close_p|,\ |Low\ - \ Close_p|],
$$
<center>where $p$ — previous<center>

2. Calculate $ATR_{p}$:
$$
ATR_p = \frac{1}{n} \sum \limits _{i=1} ^{n} TR_i,
$$ 
<center>where $n$ — the time period employed, $p$ — previous<center>
    
2. Calculate $ATR$:
$$
ATR = \frac{ATR_p(n \ - \ 1)\ + \ TR}{n}
$$


Let's declare the function responsible of calculating the ATR indicator values and create separate column in our data frame to store the result.


In [ ]:
def natr(df: pd.DataFrame, period: int = 15) -> pd.Series:
    """Return the ATR Normalized (NATR) indicator.
    """
    # calculate values
    high, low, close = df['high'], df['low'], df['close']
    
    high_low = high - low
    high_close = np.abs(high - close.shift())
    low_close = np.abs(low - close.shift())
    
    # calculate True Range
    ranges = pd.concat([high_low, high_close, low_close], axis=1)
    true_range = np.max(ranges, axis=1)
    
    # calculate previous ATR
    atr_prev = true_range.rolling(period).sum() / period
    
    # calculate current ATR
    atr = (atr_prev*(period - 1) + true_range) / period
    
    # normalize ATR 
    natr = (100 * atr) / df['close']
    return natr

In [ ]:
df['natr_indicator'] = natr(df)
df[['natr_indicator']].head(20)

Once more, we observe `NaN` values in the first 14 rows of obtained data. Since the ATR is a moving average of the true ranges in a specific period <i>(in our case 15-minute interval)</i>, the first 14 entries have not needed data to be calculated and are filled with `NaN` values.


### True Range (TRANGE)


<i>$True\ Range\ (TRANGE)$</i> is a technical indicator which measures the daily range plus any gap from the closing price of the preceding day.

True Range is calculated as the greater of:

<ul>
<li><em>High for the period</em> less <em>the Low for the period</em></li>
<li><em>High for the period</em> less <em>the Close for the previous period</em></li>
<li><em>Close for the previous period</em> and <em>the Low for the current period</em></li>
</ul>

The formula which find the maximum among the specified values is:
<br>
<br>
$$
TR = \textrm{max}[(High_p\ - \ Low_p),\ |High_p\ - Close_{p-1}|,\ |Low_p\ - \ Close_{p-1}|],
$$
<center>where $p$ — current period<center>


Let's declare the function responsible of calculating the TRANGE indicator values and create separate column in our data frame to store the result.


In [ ]:
def trange(df: pd.DataFrame) -> pd.Series:
    """Return the True Range (TRANGE) indicator.
    """
    # calculate values
    high, low, close = df['high'], df['low'], df['close']
    
    high_low = high - low
    high_close = np.abs(high - close.shift())
    low_close = np.abs(low - close.shift())
    
    # calculate maximum of obtained values
    ranges = pd.concat([high_low, high_close, low_close], axis=1)
    true_range = np.amax(ranges, axis=1)
    true_range[0] = np.nan
    return true_range

In [ ]:
df['trange_indicator'] = trange(df)
df[['trange_indicator']].head(15)

Notice that for calculating TRANGE previous period is also considered. As a result, the first row contains `NaN` value. 


### Add Headers
<p>
If we do not specify the header of our dataset by passing an argument <code>headers = None</code> inside the <code>read_csv()</code> method, Pandas automatically sets it with an integer starting from 0.
</p>
<p>
To better describe our data, it is a best practice to introduce a header. 
Information of dataset we are using is available <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0QGDEN/BTCBUSD_trades.csv" target="_blank">HERE</a>.
</p>
<p>
Thus, if headers are absent, we have to add them manually. Let's learn how to deal with it.
</p>
<p>
First, we create a list "headers" that includes all column names in order.
Then, we use <code>dataframe.columns = headers</code> to replace the headers with the list we created.
</p>


In [ ]:
# create headers list
headers = ["BS", "Price", "Volume", "Count", "Open", "High", "Close", "Low", 'ADOSC', 'NATR', 'TRANGE']
print("headers\n", headers)

We replace headers and recheck our dataframe:


In [ ]:
df.columns = headers
df.head(10)

It is also possible to change the name of the index columns. Use <code>df.index.names</code> method to initialize new names.


In [ ]:
df.index.names = ['Time']
df.head(10)

Excellent! Our dataframe has transformed in a positive way. Now the headers are clear and concise.


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<b style="font-size: 2em; font-weight: 600;">Question #2:</b>    

<b>Delete the "BS" column of the dataframe. Use `df.drop('column_name', axis=1, inplace=True)` method, where `'column_name'` stands for name of the column to be removed.</b>

</div>


<strong><em>Note:</em></strong> <code>axis=1</code> parameter means dropping columns, whereas <code>axis=0</code> is used for rows.<code>inplace=True</code> parameter means doing operation inplace and modifying current dataframe.


In [ ]:
# Write your code below and press Shift+Enter to execute 
df.drop('BS', axis=1, inplace=True)
df.head()

<details><summary>Click here for the solution</summary>

```python
df.drop('BS', axis=1, inplace=True)
df.head()
```

</details>


## 3. Basic Insight of Dataset
<p>
After reading data into Pandas dataframe, it is time for us to explore the dataset.<br>
There are several ways to obtain essential insights of the data to help us better understand our dataset.
</p>


## Data Types
<p>
Data has a variety of types.<br>

The main types stored in Pandas dataframes are <strong>object</strong>, <strong>float</strong>, <strong>int</strong>, <strong>bool</strong> and <strong>datetime</strong>. In order to better learn about each attribute, it is always good for us to know the data type of each column. In Pandas to return series with the data type of each column of dataframe <code>.dtypes</code> is used.
</p>


In [ ]:
# check the data type of data frame "df" by .dtypes
print(df.dtypes)

<p>
As shown above, it is clear to see that the data type of "Open", "High", "Low", "Close", "Price", and "Volume" are <code>float64</code>, and "Count" is <code>int64</code>, etc.
</p>
<p>
These data types can be changed. We will learn how to accomplish this in a later module.
</p>


## Describe

If we would like to get a statistical summary of each column e.g. <em>count</em>, <em>column mean value</em>, <em>column standard deviation</em>., etc., we use the describe method:


This method will provide various summary statistics, excluding <code>NaN</code> (Not a Number) values.


In [ ]:
# get a statistical summary of each column of dataframe using .describe()
df.describe()

<p>
This shows the statistical summary of all numeric-typed (int, float) columns.<br>

For example, the attribute "Count" has 18056 counts, the mean value of this column is 27.188, the standard deviation is 16.389, the minimum value is 0, 25th percentile is 15, 50th percentile is 25, 75th percentile is 38, and the maximum value is 100.<br>

However, what if we would also like to check all the columns including those that are of type *object*? <br><br>
You can add an argument <code>include="all"</code> inside the bracket. Let's try it again.
</p>


In [ ]:
# describe all the columns in "df"
df.describe(include="all")

<p>
Now it provides the statistical summary of all the columns, including object-typed attributes. As our dataframe has not got any object-typed columns the  results are the same.<br>

However, If we had ones, we could see how many unique values there, which one is the top value and the frequency of top value in the object-typed columns. In addition, some object-typed values in the summary table can be shown as `NaN`. This is because those numbers are not available regarding a particular column type.<br>

</p>


<div class="alert alert-danger alertdanger" style="margin-top: 20px">

<b style="font-size: 2em; font-weight: 600;">Question #3:</b> 

<p>
You can select the columns of a dataframe by indicating the name of each column. For example, you can select the three columns as follows:
</p>
<p>
    <code>dataframe[['column 1',column 2', 'column 3']]</code>
</p>
<p>
    Where <strong>'column'</strong> is the name of the column, you can apply the method  <code>.describe()</code> to get the statistics of those columns as follows:
</p>
<p>
    <code>dataframe[['column 1',column 2', 'column 3'] ].describe()</code>
</p>

Apply the <code>.describe()</code> method to the columns <strong>'Count'</strong> and <strong>'Price'</strong>.

</div>


In [ ]:
# Write your code below and press Shift+Enter to execute
df[['Count', 'Price']].describe()

<details><summary>Click here for the solution</summary>

```python
df[['Count', 'Price']].describe()
```

</details>


## Info

Another method you can use to check your dataset is:


It provides a concise summary of your DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.


In [ ]:
# look at the info of "df"
df.info()

## Save Dataset
<p>
Correspondingly, Pandas enables us to save the dataset to csv. By using the <code>dataframe.to_csv()</code> method, you can add the file path and name along with quotation marks in the brackets.
</p>
<p>
For example, if you would save the dataframe <strong>df</strong> as <strong>BTCBUSD_trades_1m.csv</strong> to your local machine, you may use the syntax below, where <code>index=True</code> means the row names will be written as well.
</p>


In [ ]:
df.to_csv("BTCBUSD_trades_1m.csv", index=True)

We can also read and save other file formats. We can use similar functions like `pd.read_csv()` and `df.to_csv()` for other data formats. The functions are listed in the following table:


## Read/Save Other Data Formats

| Data Formate |        Read       |            Save |
| ------------ | :---------------: | --------------: |
| csv          |  `pd.read_csv()`  |   `df.to_csv()` |
| json         |  `pd.read_json()` |  `df.to_json()` |
| excel        | `pd.read_excel()` | `df.to_excel()` |
| hdf          |  `pd.read_hdf()`  |   `df.to_hdf()` |
| sql          |  `pd.read_sql()`  |   `df.to_sql()` |
| ...          |        ...        |             ... |


## Excellent! You have just completed the notebook!


### Thank you for completing this lab!

## Authors

<a href="https://author.skills.network/instructors/yaryna_beida?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Yaryna Beida</a>

<a href="https://author.skills.network/instructors/yaroslav_vyklyuk_2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Yaroslav Vyklyuk, DrSc, PhD</a>

<a href="https://author.skills.network/instructors/mariya_fleychuk?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkGuidedProjectsIBMSkillsNetworkGPXX0QGDEN2306-2023-01-01">Prof. Mariya Fleychuk, DrSc, PhD</a>


## Change Log

| Date (YYYY-MM-DD) | Version | Changed By   | Change Description                                         |
| ----------------- | ------- | -------------| ---------------------------------------------------------- |
|     2023-02-25    |   1.0   | Yaryna Beida | Lab created                                                |

<hr>

## <h3 align="center"> © IBM Corporation 2023. All rights reserved. <h3/>
