<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# Python for Finance Key Skills

&copy; Dr. Yves J. Hilpisch | The Python Quants GmbH

http://tpq.io | [training@tpq.io](mailto:trainin@tpq.io) | [@dyjh](http://twitter.com/dyjh)

## Supervised Learning

### Imports

In [None]:
!git clone https://github.com/tpq-classes/pff_key_skills.git
import sys
sys.path.append('pff_key_skills')


In [None]:
import numpy as np
import pandas as pd
from pylab import plt, mpl

In [None]:
plt.style.use('seaborn-v0_8')
%config InlineBackend.figure_format = 'svg'

### Supervised Learning

About supervised learning (from https://perplexity.ai):

> Supervised learning is a type of machine learning where the algorithm learns from labeled training data to make predictions or decisions. It involves input variables (X) and an output variable (Y), and the algorithm learns the mapping function from the input to the output. The main goal of supervised learning is to learn a mapping from input to output and make accurate predictions on new, unseen data. Some example algorithms used in supervised learning include:
>
> 1. Linear Regression: Used for predicting continuous output values.
> 2. Logistic Regression: Used for binary classification problems.
> 3. Decision Trees: Used for both classification and regression problems.
> 4. Random Forest: An ensemble learning method used for classification and regression.
> 5. Support Vector Machines (SVM): Used for classification and regression tasks.
>
> Typical goals of supervised learning include predicting future outcomes, classifying data into categories, and making decisions based on input data. Examples of supervised learning applications include spam detection, sentiment analysis, and customer churn prediction.

#### Simple Examples

In [None]:
int(True)

In [None]:
int(False)

In [None]:
f = np.array(((0, 0), (0, 1), (1, 0), (1, 1)))
f  # features (input variables)

In [None]:
f[:, 0]

In [None]:
f[:, 1]

In [None]:
f[:, 0] & f[:, 1]  # element-wise AND operator

In [None]:
f[:, 0] | f[:, 1]  # element-wise OR operator

In [None]:
f[:, 0] ^ f[:, 1]  # element-wise XOR operator

In [None]:
l = f[:, 0] ^ f[:, 1]  # labels (output variables)
l

In [None]:
from sklearn.naive_bayes import GaussianNB  # supervised classification algorithm
from sklearn.tree import DecisionTreeClassifier  # supervised classification algorithm

In [None]:
# model = GaussianNB()  # 1. step: model instantiation

In [None]:
model = DecisionTreeClassifier()  # 1. step: model instantiation

In [None]:
model.fit(f, l)  # 2. step: model fitting

In [None]:
model.predict(f)

#### The Data

In [None]:
from bsm73 import bsm_call_value

In [None]:
S0 = 100.
K = 95.
T = 1.5
r = 0.04
sigma = 0.15

Function for option valuation: $f(a, b, c, d, e) =$ `option value`

In [None]:
bsm_call_value(S0, K, T, r, sigma)

In [None]:
n = 5

In [None]:
S0_ = np.linspace(80, 120, n)
S0_

In [None]:
K_ = np.linspace(80, 120, n)
K_

In [None]:
T_ = np.linspace(0.5, 1.5, n)
T_

In [None]:
r_ = np.linspace(0.0, 0.1, n)
r_

In [None]:
sigma_ = np.linspace(0.1, 0.3, n)
sigma_

In [None]:
list(zip(S0_, T_))

In [None]:
from itertools import product

In [None]:
list(product(S0_, T_))[:7]

In [None]:
n ** n

In [None]:
data = pd.DataFrame()

In [None]:
%%time
for S0, K, T, r, sigma in product(S0_, K_, T_, r_, sigma_):
    value = bsm_call_value(S0, K, T, r, sigma)
    df = pd.DataFrame({'S0': S0, 'K': K, 'T': T, 'r': r,
                       'sigma': sigma, 'value': value},
                      index=[0])
    data = pd.concat((data, df), ignore_index=True)

In [None]:
data.tail()

In [None]:
f_cols = ['S0', 'K', 'T', 'r', 'sigma']

#### Original Data

In [None]:
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error

In [None]:
# MLPRegressor?

In [None]:
model = MLPRegressor(hidden_layer_sizes=[128, 128, 128],
                     max_iter=1000)  # 1. step

In [None]:
%time model.fit(data[f_cols], data['value'])  # 2. step

In [None]:
model.predict(data[f_cols])

In [None]:
data['est'] = model.predict(data[f_cols])

In [None]:
data.tail()

In [None]:
mean_squared_error(data['value'], data['est'])

In [None]:
data[['value', 'est']].plot();

In [None]:
plt.plot(range(len(data)), data.sort_values('value')[['value', 'est']], alpha=0.7);

#### Normalized Data

In [None]:
data_ = (data - data.mean()) / data.std()
data_.head()

In [None]:
%time model.fit(data_[f_cols], data['value'])

In [None]:
data['est_norm'] = model.predict(data_[f_cols])

In [None]:
data.tail()

In [None]:
mean_squared_error(data['value'], data['est_norm'])

In [None]:
data[['value', 'est_norm']].plot(alpha=0.75);

In [None]:
data[['value', 'est_norm']].iloc[-100:].plot(alpha=0.75);

In [None]:
plt.plot(range(len(data)), data.sort_values('value')[['value', 'est_norm']], alpha=0.75);

In [None]:
plt.plot(range(100), data.sort_values('value').iloc[-100:][['value', 'est_norm']], alpha=0.75);

<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="mailto:training@tpq.io">training@tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> 