## PyTorch

PyTorch is very similar to Keras. Apparently, Keras was the first foray of deep learning models into the python environment. PyTorch also uses a TensorFlow back end, but the API is customised and somehow better than Keras. I could not tell you why, but the documentation from oracle and the like state this, and I am not one to argue with those that know better. 

I can also tell you that PyTorch computes its inputs forwards, then plays them backwards to compute gradients. This is likely to cut down on computing power required to predict things. 

As a very junior data person with very little exposure to programming and being new at python, I have chosen to adapt a tutorial to this situation. Link is in the Bibliography. 


In [141]:
import pandas as pd
import numpy as np
import numpy as numpy
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn import preprocessing

In [142]:
from keras.models import Sequential, load_model
from keras.layers import LSTM, Dense, Dropout
from sklearn.model_selection import train_test_split

Lets start with the reading of the CSV and dividing the data by stock number. Now this method essentially adds error as I intend to train this model, and use it to predict other stocks. This really would increase error, however in terms of predicting stocks this is likely to be slightly more accurate than a random number generator providing a predictor of the stocks. In my mind, this is the benchmark for purchasing stocks in this manner, and any thing that provides a slightly more than 50% accuracy in real life is good to go.

I must add, this is not how I would pick my stocks. I tend to purchase stocks in either:

1. Index funds with low fees, or

2. Companies that I know and use personally. They all have good management or governance, is below it's inherent value, and has some sort of moat. (Well if possible, I have been purchasing petrochemical stocks with my petrol savings from driving a prius so that I can thank my ute driving friends for giving me money as they drive down the road.)

In [143]:
df = pd.read_csv('C:/Users/crump/Documents/University/755 Data Analytics/jpx_tokyo_se_prediction/train_files/stock_prices.csv' )


secs = [1377]


df = df[df['SecuritiesCode'].isin(secs)]
df.Close=df.Close.ffill()

df.head()

Unnamed: 0,RowId,Date,SecuritiesCode,Open,High,Low,Close,Volume,AdjustmentFactor,ExpectedDividend,SupervisionFlag,Target
4,20170104_1377,2017-01-04,1377,3270.0,3350.0,3270.0,3330.0,150800,1.0,,False,0.003026
1869,20170105_1377,2017-01-05,1377,3340.0,3355.0,3295.0,3305.0,155700,1.0,,False,0.004525
3734,20170106_1377,2017-01-06,1377,3320.0,3335.0,3260.0,3315.0,153300,1.0,,False,-0.033033
5599,20170110_1377,2017-01-10,1377,3325.0,3360.0,3310.0,3330.0,192500,1.0,,False,0.046584
7464,20170111_1377,2017-01-11,1377,3260.0,3295.0,3180.0,3220.0,741200,1.0,,False,-0.010386


This is normalising the training data. For this method, we can use -1 to 1 scale. 

In [144]:
price = df.Close
scaler = MinMaxScaler(feature_range=(-1, 1))
price = scaler.fit_transform(price.values.reshape(-1,1))
price

array([[-0.12797167],
       [-0.15326252],
       [-0.14314618],
       ...,
       [-0.32524026],
       [-0.35053111],
       [-0.28983308]])

In [145]:
df1=df.Target
df1

4          0.003026
1869       0.004525
3734      -0.033033
5599       0.046584
7464      -0.010386
             ...   
2322536    0.003200
2324536   -0.007974
2326536    0.019293
2328536    0.009464
2330536    0.026562
Name: Target, Length: 1202, dtype: float64

Here we are splitting the data into training and testings sets. 

In [146]:
def split_data(price, lookback):
    data_raw = numpy.array(price) # convert to numpy array
    data = []
    
    # create all possible sequences of length seq_len
    for index in range(len(data_raw) - lookback): 
        data.append(data_raw[index: index + lookback])
    
    data = np.array(data);
    test_set_size = int(np.round(0.2*data.shape[0]));
    train_set_size = data.shape[0] - (test_set_size);
    
    x_train = data[:train_set_size,:-1,:]
    y_train = data[:train_set_size,-1,:]
    
    bx_test = data[train_set_size:,:-1]
    by_test = data[train_set_size:,-1,:]
    
    return [x_train, y_train, bx_test, by_test]
lookback = 20 # choose sequence length
x_train, y_train,x_test, y_test = split_data(price, lookback)

Here we are applying tensors to the arrays we created above. Essentially, they are just another way of organising the data so in a format that the model is expecting. NumPy can also do this.

In [147]:
import torch
import torch.nn as nn
x_train = torch.from_numpy(x_train).type(torch.Tensor)
x_test = torch.from_numpy(x_test).type(torch.Tensor)
y_train_lstm = torch.from_numpy(y_train).type(torch.Tensor)
y_test_lstm = torch.from_numpy(y_test).type(torch.Tensor)

This is setting the conditions. I also use this phrase in my work... Usually for explaining why I am doing something odd. Here we are telling the computer how many layers, how many times we are sending the model through the data and so on. You will note that there is not many epochs. This is to make computation easy for my little 8Gb computer. In the cloud, we would have a look at how driving up epochs drives up accuracy then overfitting and "find the elbow" where we look at how many epochs would create the most accurate model without driving up overfitting. 

In [153]:
input_dim = 1
hidden_dim = 32
num_layers = 2
output_dim = 1
num_epochs = 40

Here we are creating the LSTM for the model to be trained on. Below that is how we are training the model using the LSTM we created immediately below. 

In [154]:
class LSTM(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_layers, output_dim):
        super(LSTM, self).__init__()
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers
        
        self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)
    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_()
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_()
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
        out = self.fc(out[:, -1, :]) 
        return out

In [150]:
model = LSTM(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim, num_layers=num_layers)
criterion = torch.nn.MSELoss(reduction='mean')
optimiser = torch.optim.Adam(model.parameters(), lr=0.01)

Now we are setting the model to the data. Otherwise know as training. It goes through the data 5 times (because that is what we set it to) for which we can use the model to look at new data, compare and make a prediction. 

In [155]:
import time

hist = np.zeros(num_epochs)
start_time = time.time()
lstm = []

for t in range(num_epochs):
    y_train_pred = model(x_train)

    loss = criterion(y_train_pred, y_train_lstm)
    print("Epoch ", t, "MSE: ", loss.item())
    hist[t] = loss.item()

    optimiser.zero_grad()
    loss.backward()
    optimiser.step()
    
training_time = time.time()-start_time
print("Training time: {}".format(training_time))

Epoch  0 MSE:  0.04907715320587158
Epoch  1 MSE:  0.035570088773965836
Epoch  2 MSE:  0.021377667784690857
Epoch  3 MSE:  0.02958555892109871
Epoch  4 MSE:  0.026135524734854698
Epoch  5 MSE:  0.02078162506222725
Epoch  6 MSE:  0.015142541378736496
Epoch  7 MSE:  0.016536669805645943
Epoch  8 MSE:  0.01943903975188732
Epoch  9 MSE:  0.018666766583919525
Epoch  10 MSE:  0.014879148453474045
Epoch  11 MSE:  0.012219926342368126
Epoch  12 MSE:  0.013683983124792576
Epoch  13 MSE:  0.015316862612962723
Epoch  14 MSE:  0.012629331089556217
Epoch  15 MSE:  0.010566500946879387
Epoch  16 MSE:  0.011148560792207718
Epoch  17 MSE:  0.011514091864228249
Epoch  18 MSE:  0.0105049517005682
Epoch  19 MSE:  0.009473678655922413
Epoch  20 MSE:  0.009326554834842682
Epoch  21 MSE:  0.00951460562646389
Epoch  22 MSE:  0.009148440323770046
Epoch  23 MSE:  0.00836796686053276
Epoch  24 MSE:  0.008179725147783756
Epoch  25 MSE:  0.008552486076951027
Epoch  26 MSE:  0.008244350552558899
Epoch  27 MSE:  0.0

Here we are looking at the accuracy of the model that we have built. The root mean squared error is a measure of how much error there is in our predictions. As you can see there is a lot of error in our predictions as we have only trained the model 5 times. I have put the model through 5 epochs which makes it very inaccurate indeed. At 40 the RMSE is below.  

In [156]:
import math, time
from sklearn.metrics import mean_squared_error

# make predictions
y_test_pred = model(x_test)

# invert predictions
y_train_pred = scaler.inverse_transform(y_train_pred.detach().numpy())
y_train = scaler.inverse_transform(y_train_lstm.detach().numpy())
y_test_pred = scaler.inverse_transform(y_test_pred.detach().numpy())
y_test = scaler.inverse_transform(y_test_lstm.detach().numpy())

# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(y_train[:,0], y_train_pred[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(y_test[:,0], y_test_pred[:,0]))
print('Test Score: %.2f RMSE' % (testScore))
lstm.append(trainScore)
lstm.append(testScore)
lstm.append(training_time)

Train Score: 75.05 RMSE
Test Score: 58.21 RMSE


## Bibliography

https://stackoverflow.com/questions/63582590/why-do-we-call-detach-before-calling-numpy-on-a-pytorch-tensor?msclkid=efc3bd80ce6b11ec92e6d5b2ca0d6fc3

https://www.educba.com/pytorch-tensors/?msclkid=7c5bd64bce7911ec97766487de736050

https://stackoverflow.com/questions/63582590/why-do-we-call-detach-before-calling-numpy-on-a-pytorch-tensor?msclkid=efc3bd80ce6b11ec92e6d5b2ca0d6fc3

https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

https://stackoverflow.com/questions/30197943/numpy-ndarray-object-has-no-attribute-remove