# 10. Milestone Project 3: Time series forecasting in TensorFlow (BitPredict 💰📈)

The goal of this notebook is to get you familiar with working with time series data.

We're going to be building a series of models in an attempt to predict the price of Bitcoin.

Welcome to Milestone Project 3, BitPredict 💰📈!

> 🔑 Note: ⚠️ This is not financial advice, as you'll see time series forecasting for stock market prices is actually quite terrible.

## What is a time series problem?

Time series problems deal with data over time.

Such as the number of staff members in a company over 10 years, sales of computers for the past 5 years, and electricity usage for the past 50 years.

The timeline can be short (seconds/minutes) or long (years/decades). The problems you might investigate can usually be broken down into two categories.

![image0](https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/10-example-time-series-problems.png)

|Problem Type|Examples|Output|
|--|--|--|
|**Classification**|Anomaly detection, time series identification<br>(where did this time series come from?)|Discrete (a label)|
|**Forecasting**|Predicting stock market prices,<br>forecasting future demand for a product,<br>and stocking inventory requirements|Continuous (a number)|

In both cases above, a supervised learning approach is often used. Meaning, you'd have some example data and a label associated with that data.

For example, in forecasting the price of Bitcoin, your data could be the historical price of Bitcoin for the past month and the label could be today's price (the label can't be tomorrow's price because that's what we'd want to predict).

Can you guess what kind of problem BitPredict 💰📈 is?

## What we're going to cover

Are you ready?

We've got a lot to go through.

- Get time series data (the historical price of Bitcoin)
    - Load in time series data using pandas/Python's CSV module
- Format data for a time series problem
    - Creating training and test sets (the wrong way)
    - Creating training and test sets (the right way)
    - Visualizing time series data
    - Turning time series data into a supervised learning problem (windowing)
    - Preparing univariate and multivariate (more than one variable) data
- Evaluating a time series forecasting model
- Setting up a series of deep learning modeling experiments
    - Dense (fully connected) networks
    - Sequence models (LSTM and 1D CNN)
    - Ensembling (combining multiple models)
    - Multivariate models
    - Replicating the N-BEATS algorithm using TensorFlow layer subclassing
- Creating a modeling checkpoint to save the best-performing model during training
- Making predictions (forecasts) with a time series model
- Creating prediction intervals for time series model forecasts
- Discussing two different types of uncertainty in machine learning (data uncertainty and model uncertainty)
- Demonstrating why forecasting in an open system is BS (the turkey problem)

## Check for GPU

For our deep learning models to run as fast as possible, we'll need access to a GPU.

In Google Colab, you can set this up by going to Runtime -> Change runtime type -> Hardware accelerator -> GPU.

After selecting GPU, you may have to restart the runtime.

In [None]:
# Check for GPU
!nvidia-smi -L

## Get data

To build a time series forecasting model, the first thing we're going to need is data.

And since we're trying to predict the price of Bitcoin, we'll need Bitcoin data.

Specifically, we're going to get the prices of Bitcoin from 01 October 2013 to 18 May 2021.

Why these dates?

Because 01 October 2013 is when our data source ([Coindesk](https://www.coindesk.com/price/bitcoin)) started recording the price of Bitcoin and 18 May 2021 is when this notebook was created.

If you're going through this notebook at a later date, you'll be able to use what you learn to predict later dates of Bitcoin, you'll just have to adjust the data source.

> 📖 Resource: To get the Bitcoin historical data, I went to the [Coindesk page for Bitcoin prices](https://www.coindesk.com/price/bitcoin), clicked on "all" and then clicked on "Export data" and selected "CSV".

You can find the data we're going to use on [GitHub](https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/extras/BTC_USD_2013-10-01_2021-05-18-CoinDesk.csv).

In [None]:
# Download Bitcoin historical data from GitHub
# Note: you'll need to select "Raw" to download the data in the correct format
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/BTC_USD_2013-10-01_2021-05-18-CoinDesk.csv

### Importing time series data with pandas

Now we've got some data to work with, let's import it using pandas so we can visualize it.

Because our data is in **CSV (comma separated values)** format (a very common data format for time series), we'll use the pandas [`read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function.

And because our data has a date component, we'll tell pandas to parse the dates using the `parse_dates` parameter passing it the name our of the date column ("Date").

In [None]:
# Import with pandas
import pandas as pd

# Parse dates and set date column to index
df = pd.read_csv("BTC_USD_2013-10-01_2021-05-18-CoinDesk.csv",
                 parse_dates=["Date"],
                #  index_col=["Date"] # parse the date column (tell pandas that column 1 is a datetime)
                 )

df.head()

Looking good! Let's get some more info.

In [None]:
df.info()

Because we told pandas to parse the date column and set it as the index, it's not in the list of columns.

You can also see there aren't many samples.

In [None]:
# How many samples do we have?
len(df)

We've collected the historical price of Bitcoin for the past ~8 years but there are only 2787 total samples.

This is something you'll run into with time series data problems. Often, the number of samples isn't as large as other kinds of data.

For example, collecting one sample at different time frames results in:

|1 sample per timeframe|Number of samples per year|
|--|--|
|Second|	31,536,000|
|Hour|	8,760|
|Day|	365|
|Week|	52|
|Month|	12|

> 🔑 Note: The frequency at which a time series value is collected is often referred to as **seasonality**. This is usually measured in several samples per year. For example, collecting the price of Bitcoin once per day would result in a time series with a seasonality of 365. Time series data collected with different seasonality values often exhibit seasonal patterns (e.g. electricity demand being higher in Summer months for air conditioning than in Winter months). For more on different time series patterns, see [Forecasting: Principles and Practice Chapter 2.3](https://otexts.com/fpp3/tspatterns.html).

![image1](https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/10-types-of-time-series-patterns.png)
Example of different kinds of patterns you'll see in time series data. Notice the bottom right time series (Google stock price changes) has little to no patterns, making it difficult to predict. See [Forecasting: Principles and Practice Chapter 2.3](https://otexts.com/fpp3/tspatterns.html) for full graphic.

Deep learning algorithms usually flourish with lots of data, in the range of thousands to millions of samples.

In our case, we've got the daily prices of Bitcoin, a max of 365 samples per year.

But that doesn't mean we can't try them with our data.

To simplify, let's remove some of the columns from our data so we're only left with a date index and the closing price.

In [None]:
# Only want the closing price for each day
bitcoin_prices = pd.DataFrame(df["Closing Price (USD)"]).rename(columns={"Closing Price (USD)": "Price"})
bitcoin_prices.head()

Much better!

But that's only five days worth of Bitcoin prices, let's plot everything we've got.