# Stock-Forecasting

In this exercise, you will use RNNs to predict the stock market evolution.

Stock market can be seen as a sequence of values (each day could be a time step), and thus you can predict the closing value of the next day, knowing the past.

## Data exploration

First, load the dataset `all_stocks_5yr.csv`, which contains all the stock market values over 5 years for many companies. Feel free to explore it.

In [None]:
# TODO: Load the dataset and explore it


The column name represent the codename of the stock, for example `'AAL'` stands for American Airlines, `'AAPL'` for Apple, and so on... You can check a company by googling the codename.

Select a company for which you have enough information (i.e. a lot of datapoints, at least 1000), and plot the `close` value of this stock as a function of time. Let's say this represents the stock market evolution.

In [None]:
# TODO: Plot the stock market evolution of a given name


> Optional: for those who want a more accurate representation of the stock market, a really common visualization is the candlesticks. One can plot them using the matplotlib finance library https://github.com/matplotlib/mplfinance

In [None]:
# Optional: plot the candlesticks


## Data preparation

We will now try to make a prediction of this close value of a day, based on all the features (`open`, `high`, `low`, `close`, `volume`) of the 30 past days. This value of 30 days is often called the **lookback**.

But before going further, you might want to consider to rescale your data, using for example a standard scaler.

In [None]:
# TODO: normalize the data


We now need to create the (X, y) dataset.

Let's have a example to understand what to do:

First, let's consider your action `Name` has 100 lines for the example.

Since you need the past 30 days to predict a value, you are not able to perform any prediction in the first 30 days.

Then the `X` values should contain, in each line, a table of 30 days and 5 features (`open`, `high`, `low`, `close`, `volume`). So that the final `X` array will have the shape `(70, 30, 5)`:
- 70 is the number of samples, and is 100 days minus 30 for the lookback
- 30 for the past 30 days, this is the lookback
- 5 for the features `open`, `high`, `low`, `close`, `volume`

This `y` values should be the `close` values of the days 31 to last (indeed, `y` can not contain the 30 first days, since we need 30 days of X to predict anything). So the final `y` array will have the shape `(70, 1)` (or equivalently `(70,)`).

In [None]:
# TODO: compute X and y


You already know the next step: split the data. 
> Be careful, we need to keep the order in sequence!

In [None]:
# TODO: Split the data


## Model training and evaluation

Now that the data is ready, build a RNN model (for example begin with 2 layers of 16 units), compile it and train it.

In [None]:
# TODO: Build and train your RNN model


Plot the results: display on the same plot `y_train`, `y_test` and the prediction of `y_test`.

In [None]:
# TODO: Plot the results


You can try to improve your model by adding as features the information of other actions. As you know, in stock market, most of the information that impacts it is out of the stock market information itself.

## Backtesting

In real life, traders backtest a trading strategy in order to check if it works.

The principle is the following:
- You define a strategy: for example buy when prediction of next day increases, sell when prediction of next day decreases
- You test this strategy on test dataset with real data, with a given amount of money for start
- You compare your relative return to the market return
- If the relative return is greater than 1, and you don't lose money at the end, your strategy is worth trying in real life

Feel free to implement a backtesting of your model with a given strategy.