<img src="http://certificate.tpq.io/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# AI in Finance

**Workshop at Texas State University (October 2023)**

**_Simple Financial Examples_**

Dr. Yves J. Hilpisch | The Python Quants GmbH | http://tpq.io

## Imports

In [None]:
import numpy as np
import pandas as pd
from pylab import plt
plt.style.use('seaborn-v0_8')
%config InlineBackend.figure_format = 'svg'

## Stock Clustering

Data from [EODHistoricalData](https://eodhistoricaldata.com/r/?ref=X8R79ISB).

### The Data

In [None]:
f = pd.read_csv('../sources/eod_fundamentals.csv', index_col=0)

In [None]:
f

In [None]:
data = f.T[['QuarterlyRevenueGrowthYOY', 'ReturnOnEquityTTM']].astype(float)

In [None]:
data

In [None]:
data.columns = ['Growth', 'ROE']

In [None]:
data.info()

### Raw Data

In [None]:
from sklearn.cluster import KMeans

In [None]:
model = KMeans(n_clusters=3, n_init=2)

In [None]:
model.fit(data)

In [None]:
c = model.predict(data)
c

In [None]:
data.plot.scatter(x='Growth', y='ROE', c=c, cmap='brg');

### Normalized Data

In [None]:
data_ = (data - data.mean()) / data.std()

In [None]:
data_

In [None]:
model = KMeans(n_clusters=5, init='random', n_init='auto', algorithm='lloyd')

In [None]:
model.fit(data_)

In [None]:
c = model.predict(data_)
c

In [None]:
data_.plot.scatter(x='Growth', y='ROE', c=c, cmap='brg');

### Adding 3rd Feature 

In [None]:
cols = ['QuarterlyRevenueGrowthYOY', 'ReturnOnEquityTTM', 'DividendYield']

In [None]:
data = f.T[cols].astype(float)

In [None]:
data

In [None]:
data.columns = ['Growth', 'ROE', 'DY']

In [None]:
model = KMeans(n_clusters=4, n_init=2)

In [None]:
model.fit(data)

In [None]:
c = model.predict(data)
c

In [None]:
from mpl_toolkits import mplot3d

In [None]:
fig = plt.figure(figsize = (10, 10))
ax = plt.axes(projection ='3d')
ax.scatter3D(data['Growth'], data['ROE'], data['DY'],
             c=c, s=100, cmap='brg')
ax.set_xlabel('Growth')
ax.set_ylabel('ROE')
ax.set_zlabel('DY')
ax.view_init(elev=15, azim=-30);

In [None]:
data_ = (data - data.mean()) / data.std()

In [None]:
data_

In [None]:
model = KMeans(n_clusters=3, init='random', n_init='auto', algorithm='lloyd')

In [None]:
model.fit(data_)

In [None]:
c = model.predict(data_)
c

In [None]:
fig = plt.figure(figsize = (10, 10))
ax = plt.axes(projection ='3d')
ax.scatter3D(data['Growth'], data['ROE'], data['DY'],
             c=c, s=100, cmap='brg')
ax.set_xlabel('Growth')
ax.set_ylabel('ROE')
ax.set_zlabel('DY')
ax.view_init(elev=15, azim=-30);

## Stock Price Prediction

In [None]:
path = '../sources/eod_prices.csv'

In [None]:
raw = pd.read_csv(path, index_col=0, parse_dates=True)

## Exercise

Using NumPy, generate a random walk with a fixed seed.

Try to predict the future movement (direction) of the random walk and a real financial time series with

* OLS regression and
* ML supervised learning (e.g. MLP)

Implement the analysis both

* only in-sample and
* with train-test split

What can you say about the accuracy ratios for all cases?

<img src="http://certificate.tpq.io/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<img src='http://hilpisch.com/tpq_logo.png' width="35%" align="right">

<br><br><a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">ai@tpq.io</a>