# üí∏ &nbsp; **DeepPricing**

#### *Building and training AI to predict stock prices from options data*

<br>

<div style="display: flex;">
    <img src="./github/call_plot.png" style="width: auto; height: 250px;"> &nbsp;
    <img src="./github/nn.png" style="width: auto; height: 250px;"> &nbsp;
    <img src="./github/put_plot.png" style="width: auto; height: 250px;"> &nbsp;
</div>

<br>

## ü§ñ Table of contents

1. &nbsp; üõ†Ô∏è &nbsp; Prerequisites
   
2. &nbsp; üìà &nbsp; Generating stock data

3. &nbsp; üè¶ &nbsp; Generating option data

4. &nbsp; üß† &nbsp; The Neural Network Model

5. &nbsp; üèÖ &nbsp; Discussion of Results

<br>

## About this notebook



### üöÄ **Goal: Predicting the price of a generic stock**.

üìà We're training a Neural Network using data derived from synthetically generated stock prices.

üìä All financial data is created using widely adapted mathematical models, like the **Geometric Brownian Motion** and the **Black-Scholes Model**.

üëâ How do these models work? Check out the math [here](https://github.com/wolfno/DeepPricing/tree/main/src).

<br>

## üõ†Ô∏è &nbsp; **Prerequisites**

You can always just enjoy the show **without installing anything.**

If you would like to run the notebook yourself, ensure you have met the following requirements:

* You have installed Python 3.10 or higher.
* You are using Anaconda for Python package management.
* Check out the README for further information.

The raw script **main.py** is available [on my GitHub repository](https://github.com/wolfno/DeepPricing/tree/main/main.py).

<br>

### Used packages

####  Standard Library and Third-party

In [11]:
import numpy as np
import pandas as pd

# Optional: Let us not care about irrelevant tensorflow logs.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1' 

import tensorflow as tf

#### Local modules

In [12]:
from src.asset_classes import Stock, CallOption, PutOption
from src.math_models import stock_path, option_path

<br> </br>

## üìà &nbsp; **Generating stock data**

Let us simulate a stock path by using a **Geometric Brownian Motion**.

In [13]:
df_stock = stock_path(random_state=55)
t, X = df_stock["time"], df_stock["price"]

We have successfully created a synthetic stock path! Let's save it in an adequate object.

In [14]:
example_stock = Stock(t, X)

The previously created custom **Stock** class from the local module *asset_classes* stores the relevant data in an object.

Like every asset, a stock can be plotted and exported.

In [15]:
example_stock.plot(plot_title="Stock Price", plot_save_in_file=True)

Note: You can choose with the *plot_save_in_file* parameter whether to show the plot or to save this PNG in the */data/* folder.

<img src="./data/Stock_Price_plot.png" width=500>

<br> </br>

## üè¶ &nbsp; **Generating option data**

We are simulating the case where we have several options at hand. Let us create some more financial instruments!

 First, define the possible parameters for the options. Each parameter stands for a parameter of the options:

üî® &nbsp; K is the strike price.

üéØ &nbsp; T is the maturity in years.

üìà &nbsp; Sigma is the implicit volatility of the underlying asset.

üìä &nbsp; The option type specifies whether a call or a put should be simulated.

In [16]:
K_values = [8, 10, 12]
T_values = [0.5, 0.75]
sigma_values = [0.2, 0.3, 0.5]
option_types = ["call", "put"]

In [17]:
param_space = [(K, T, sigma, option_type)
                  for K in K_values
                  for T in T_values
                  for sigma in sigma_values
                  for option_type in option_types]

üî¢Ô∏è &nbsp; As we want to keep track of what's happening, we store each option in a dictionary and count how many we create.

In [18]:
option_dict = {}
option_count = 0

üìä &nbsp; Now for simulating the option paths:

In [19]:
for K, T, sigma, option_type in param_space:
    option_count += 1
    
    # Creating the option paths
    df_option = option_path(example_stock.time_grid,
                            example_stock.price_grid,
                            K, T, sigma,
                            option_type=option_type)

    t, X = df_option["time"], df_option["price"]

    if option_type == "call":
        # Save the result in a CallOption instance
        option_dict[option_count] = CallOption(t, X, T, K, sigma)

        # Uncomment to save result in a CSV file
        # option_dict[option_count].export(file_name=f"Call Option {option_count:03}")

        # Uncomment to save plots in PNG files
        # option_dict[option_count].plot(plot_title=f"Call Option {option_count:03}",
        #                                plot_save_in_file=True)

    if option_type == "put":
        # Save the result in a PutOption instance
        option_dict[option_count] = PutOption(t, X, T, K, sigma)

        # Uncomment to save result in a CSV file
        # option_dict[option_count].export(file_name=f"Put Option {option_count:03}")

        # Uncomment to save plots in PNG files
        # option_dict[option_count].plot(plot_title=f"Put Option {option_count:03}",
        #                                plot_save_in_file=True)

Instances of the custom classes **CallOption** and **PutOption** behave like other assets, with a few extra parameters.

Note: As before, choose with the parameter *plot_save_in_file* whether you want to save the plot as a PNG file.

<br>

### üí° **Quick recap**

* At this point, we have created **one stock path**.

* Furthermore, we have created a number of call and put options that are *based on this stock data*. 

* All that is left to do is to build a model to infer the stock prices from the option prices!

<br> </br>

## üß† &nbsp; **The Neural Network Model**

Let us summarize the data in a single DataFrame so that we can train our model later on.

In [20]:
# Using the time grid of the underlying stock
df_model = pd.DataFrame({"time": example_stock.time_grid})

# Adding price data for each individual option
for i in range(1, option_count + 1):
    option_prices = option_dict[i].price_grid
    df_model[f"option_{i:03}"] = option_prices

# Adding the stock price as the target variable
df_model["stock"] = example_stock.price_grid

Building a fully connected neural network with several hidden layers:

In [21]:
deep_model = tf.keras.Sequential([
    tf.keras.layers.Dense(32, activation='relu', input_shape=(option_count,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1)
])

Splitting into training and test data:

In [22]:
X_train, y_train = df_model.iloc[:800, 1:option_count+1], df_model.iloc[:800, -1]
X_test, y_test = df_model.iloc[800:, 1:option_count+1], df_model.iloc[800:, -1]

The model is trained on 80 % of the training set and leaves 20 % aside for validation. </br>
At the end, we evaluate the model on the test set.

In [23]:
deep_model.compile(optimizer='adam', loss='mse')
history = deep_model.fit(X_train, y_train,
                         epochs=50, batch_size=32,
                         validation_split=0.2)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


Measuring the error on the test data:

In [24]:
loss = deep_model.evaluate(X_test, y_test)
print(f'\nRoot Mean Squared Error on test data: {np.sqrt(loss)}')


Root Mean Squared Error on test data: 0.06501702158568627


üèÜ &nbsp; The RMSE on the test set is about 0.065 EUR.

<br> </br>

## üèÖ &nbsp; **Discussion of Results**

The losses reach very low values by the end of training.

The final üîë &nbsp; **training loss** is below 0.0003 and the üîë &nbsp; **validation loss** is 0.0010. Such excellent values are a potential signal for an overfitting model, though as we can see from the test set results the model has been able to generalize really well. In particular, the üîë &nbsp;  **test set error** is only 0.065.

üëâ &nbsp; Key observation: The training loss and validation loss converge nicely, suggesting good generalization. The test set error supports this assumption.

Overall, this is a strong result, and the model seems ready for deployment!