# Simple Stock Movement Classifier

**Disclaimer:** _Investing in the stock market involves risk and can lead to monetary loss. This material is purely for educational purposes and should not be taken as professional investment advice. Invest at your own discretion._

Based on [Build A Simple Stock Movement Classifier](https://www.youtube.com/watch?v=UsPIDNmiSDM) from the Youtube channel [Computer Science](https://www.youtube.com/channel/UCbmb5IoBtHZTpYZCDBOC1CA)

## Description
Use stock indicators with machine learning to try to predict the direction of a stock's price.

Import the libraries to be used:

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

## Load the dataset
Note: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).

In [2]:
import yfinance as yf

In [3]:
df = yf.download("BTC-USD", start="2019-06-04", end="2019-12-18")
# df = pd.read_csv('goog_stock.csv')

[*********************100%***********************]  1 of 1 completed


In [4]:
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-06-03,8741.747070,8743.500000,8204.185547,8208.995117,8208.995117,22004511436
2019-06-04,8210.985352,8210.985352,7564.488770,7707.770996,7707.770996,24609731549
2019-06-05,7704.343262,7901.849121,7668.668457,7824.231445,7824.231445,21760923463
2019-06-06,7819.633301,7937.340820,7571.471191,7822.023438,7822.023438,19474611077
2019-06-07,7826.901367,8126.153320,7788.373535,8043.951172,8043.951172,19141423231
...,...,...,...,...,...,...
2019-12-13,7244.662109,7293.560547,7227.122559,7269.684570,7269.684570,17125736940
2019-12-14,7268.902832,7308.836426,7097.208984,7124.673828,7124.673828,17137029730
2019-12-15,7124.239746,7181.075684,6924.375977,7152.301758,7152.301758,16881129804
2019-12-16,7153.663086,7171.168945,6903.682617,6932.480469,6932.480469,20213265950


Save the data for further reference:

In [5]:
df.to_csv('goog_stock.csv')

## Define functions

Typical time perions for moving averages are 15, 20 and 30.

Calculate Simple Moving Average (SMA)

In [6]:
def SMA(data, period=30, column='Close'):
    return data[column].rolling(window=period).mean()

Calculate Exponential Moiving Average (EMA)

In [7]:
def EMA(data, period=20, column='Close'):
    return data[column].ewm(span=period, adjust=False).mean()

Create a function to caluclate the Moving Average Convergence / Divergence (MACD)

In [8]:
def MACD(data, period_long=26, period_short=12, period_signal=9, column='Close'):
    # Calculate the Short Term EMA
    ShortEMA = EMA(data, period=period_short, column=column)
    # Calculate the Long Term EMA
    LongEMA = EMA(data, period=period_long, column=column)
    # Calculate and store the MACD into the data frame
    data['MACD'] = ShortEMA - LongEMA
    # Calculate the signal line and store it into the data frame
    data['signal_Line'] = EMA(data, period=period_signal, column='MACD')
    
    return data    

Create a function to calculate Relative Strength Index (RSI)

In [9]:
def RSI(data, period=14, column='Close'):
    delta = data[column].diff(1)
    delta = delta.dropna()
    up = delta.copy()
    down = delta.copy()
    up[up < 0] = 0
    down[down > 0] = 0
    data['up'] = up
    data['down'] = down
    AVG_Gain = SMA(data, period, column='up')
    AVG_Loss = abs(SMA(data, period, column='down'))
    RS = AVG_Gain / AVG_Loss
    RSI = 100.0 - (100.0 / (1.0 + RS))
    
    data['RSI'] = RSI
    return data

## Add the Indicators to the Data Set

In [10]:
MACD(df)
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MACD,signal_Line
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2019-06-03,8741.747070,8743.500000,8204.185547,8208.995117,8208.995117,22004511436,0.000000,0.000000
2019-06-04,8210.985352,8210.985352,7564.488770,7707.770996,7707.770996,24609731549,-39.983691,-7.996738
2019-06-05,7704.343262,7901.849121,7668.668457,7824.231445,7824.231445,21760923463,-61.563997,-18.710190
2019-06-06,7819.633301,7937.340820,7571.471191,7822.023438,7822.023438,19474611077,-77.946198,-30.557391
2019-06-07,7826.901367,8126.153320,7788.373535,8043.951172,8043.951172,19141423231,-72.189343,-38.883782
...,...,...,...,...,...,...,...,...
2019-12-13,7244.662109,7293.560547,7227.122559,7269.684570,7269.684570,17125736940,-221.955029,-250.353826
2019-12-14,7268.902832,7308.836426,7097.208984,7124.673828,7124.673828,17137029730,-225.689207,-245.420902
2019-12-15,7124.239746,7181.075684,6924.375977,7152.301758,7152.301758,16881129804,-223.838951,-241.104512
2019-12-16,7153.663086,7171.168945,6903.682617,6932.480469,6932.480469,20213265950,-237.374051,-240.358420


In [11]:
RSI(df)
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MACD,signal_Line,up,down,RSI
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2019-06-03,8741.747070,8743.500000,8204.185547,8208.995117,8208.995117,22004511436,0.000000,0.000000,,,
2019-06-04,8210.985352,8210.985352,7564.488770,7707.770996,7707.770996,24609731549,-39.983691,-7.996738,0.000000,-501.224121,
2019-06-05,7704.343262,7901.849121,7668.668457,7824.231445,7824.231445,21760923463,-61.563997,-18.710190,116.460449,0.000000,
2019-06-06,7819.633301,7937.340820,7571.471191,7822.023438,7822.023438,19474611077,-77.946198,-30.557391,0.000000,-2.208008,
2019-06-07,7826.901367,8126.153320,7788.373535,8043.951172,8043.951172,19141423231,-72.189343,-38.883782,221.927734,0.000000,
...,...,...,...,...,...,...,...,...,...,...,...
2019-12-13,7244.662109,7293.560547,7227.122559,7269.684570,7269.684570,17125736940,-221.955029,-250.353826,26.550293,0.000000,29.865603
2019-12-14,7268.902832,7308.836426,7097.208984,7124.673828,7124.673828,17137029730,-225.689207,-245.420902,0.000000,-145.010742,31.051053
2019-12-15,7124.239746,7181.075684,6924.375977,7152.301758,7152.301758,16881129804,-223.838951,-241.104512,27.627930,0.000000,37.126312
2019-12-16,7153.663086,7171.168945,6903.682617,6932.480469,6932.480469,20213265950,-237.374051,-240.358420,0.000000,-219.821289,33.409660


In [12]:
df['SMA'] = SMA(df)
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MACD,signal_Line,up,down,RSI,SMA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2019-06-03,8741.747070,8743.500000,8204.185547,8208.995117,8208.995117,22004511436,0.000000,0.000000,,,,
2019-06-04,8210.985352,8210.985352,7564.488770,7707.770996,7707.770996,24609731549,-39.983691,-7.996738,0.000000,-501.224121,,
2019-06-05,7704.343262,7901.849121,7668.668457,7824.231445,7824.231445,21760923463,-61.563997,-18.710190,116.460449,0.000000,,
2019-06-06,7819.633301,7937.340820,7571.471191,7822.023438,7822.023438,19474611077,-77.946198,-30.557391,0.000000,-2.208008,,
2019-06-07,7826.901367,8126.153320,7788.373535,8043.951172,8043.951172,19141423231,-72.189343,-38.883782,221.927734,0.000000,,
...,...,...,...,...,...,...,...,...,...,...,...,...
2019-12-13,7244.662109,7293.560547,7227.122559,7269.684570,7269.684570,17125736940,-221.955029,-250.353826,26.550293,0.000000,29.865603,7626.344189
2019-12-14,7268.902832,7308.836426,7097.208984,7124.673828,7124.673828,17137029730,-225.689207,-245.420902,0.000000,-145.010742,31.051053,7573.563493
2019-12-15,7124.239746,7181.075684,6924.375977,7152.301758,7152.301758,16881129804,-223.838951,-241.104512,27.627930,0.000000,37.126312,7528.907145
2019-12-16,7153.663086,7171.168945,6903.682617,6932.480469,6932.480469,20213265950,-237.374051,-240.358420,0.000000,-219.821289,33.409660,7474.964469


In [13]:
df['EMA'] = EMA(df)
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MACD,signal_Line,up,down,RSI,SMA,EMA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2019-06-03,8741.747070,8743.500000,8204.185547,8208.995117,8208.995117,22004511436,0.000000,0.000000,,,,,8208.995117
2019-06-04,8210.985352,8210.985352,7564.488770,7707.770996,7707.770996,24609731549,-39.983691,-7.996738,0.000000,-501.224121,,,8161.259487
2019-06-05,7704.343262,7901.849121,7668.668457,7824.231445,7824.231445,21760923463,-61.563997,-18.710190,116.460449,0.000000,,,8129.161578
2019-06-06,7819.633301,7937.340820,7571.471191,7822.023438,7822.023438,19474611077,-77.946198,-30.557391,0.000000,-2.208008,,,8099.910326
2019-06-07,7826.901367,8126.153320,7788.373535,8043.951172,8043.951172,19141423231,-72.189343,-38.883782,221.927734,0.000000,,,8094.580883
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-12-13,7244.662109,7293.560547,7227.122559,7269.684570,7269.684570,17125736940,-221.955029,-250.353826,26.550293,0.000000,29.865603,7626.344189,7498.640464
2019-12-14,7268.902832,7308.836426,7097.208984,7124.673828,7124.673828,17137029730,-225.689207,-245.420902,0.000000,-145.010742,31.051053,7573.563493,7463.024593
2019-12-15,7124.239746,7181.075684,6924.375977,7152.301758,7152.301758,16881129804,-223.838951,-241.104512,27.627930,0.000000,37.126312,7528.907145,7433.431942
2019-12-16,7153.663086,7171.168945,6903.682617,6932.480469,6932.480469,20213265950,-237.374051,-240.358420,0.000000,-219.821289,33.409660,7474.964469,7385.722278


## Show the data

In [14]:
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MACD,signal_Line,up,down,RSI,SMA,EMA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2019-06-03,8741.747070,8743.500000,8204.185547,8208.995117,8208.995117,22004511436,0.000000,0.000000,,,,,8208.995117
2019-06-04,8210.985352,8210.985352,7564.488770,7707.770996,7707.770996,24609731549,-39.983691,-7.996738,0.000000,-501.224121,,,8161.259487
2019-06-05,7704.343262,7901.849121,7668.668457,7824.231445,7824.231445,21760923463,-61.563997,-18.710190,116.460449,0.000000,,,8129.161578
2019-06-06,7819.633301,7937.340820,7571.471191,7822.023438,7822.023438,19474611077,-77.946198,-30.557391,0.000000,-2.208008,,,8099.910326
2019-06-07,7826.901367,8126.153320,7788.373535,8043.951172,8043.951172,19141423231,-72.189343,-38.883782,221.927734,0.000000,,,8094.580883
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-12-13,7244.662109,7293.560547,7227.122559,7269.684570,7269.684570,17125736940,-221.955029,-250.353826,26.550293,0.000000,29.865603,7626.344189,7498.640464
2019-12-14,7268.902832,7308.836426,7097.208984,7124.673828,7124.673828,17137029730,-225.689207,-245.420902,0.000000,-145.010742,31.051053,7573.563493,7463.024593
2019-12-15,7124.239746,7181.075684,6924.375977,7152.301758,7152.301758,16881129804,-223.838951,-241.104512,27.627930,0.000000,37.126312,7528.907145,7433.431942
2019-12-16,7153.663086,7171.168945,6903.682617,6932.480469,6932.480469,20213265950,-237.374051,-240.358420,0.000000,-219.821289,33.409660,7474.964469,7385.722278


## Create the Target column

In [15]:
df['Target'] = np.where(df['Close'].shift(-1) > df['Close'], 1, 0)

Show the data

In [16]:
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MACD,signal_Line,up,down,RSI,SMA,EMA,Target
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2019-06-03,8741.747070,8743.500000,8204.185547,8208.995117,8208.995117,22004511436,0.000000,0.000000,,,,,8208.995117,0
2019-06-04,8210.985352,8210.985352,7564.488770,7707.770996,7707.770996,24609731549,-39.983691,-7.996738,0.000000,-501.224121,,,8161.259487,1
2019-06-05,7704.343262,7901.849121,7668.668457,7824.231445,7824.231445,21760923463,-61.563997,-18.710190,116.460449,0.000000,,,8129.161578,0
2019-06-06,7819.633301,7937.340820,7571.471191,7822.023438,7822.023438,19474611077,-77.946198,-30.557391,0.000000,-2.208008,,,8099.910326,1
2019-06-07,7826.901367,8126.153320,7788.373535,8043.951172,8043.951172,19141423231,-72.189343,-38.883782,221.927734,0.000000,,,8094.580883,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-12-13,7244.662109,7293.560547,7227.122559,7269.684570,7269.684570,17125736940,-221.955029,-250.353826,26.550293,0.000000,29.865603,7626.344189,7498.640464,0
2019-12-14,7268.902832,7308.836426,7097.208984,7124.673828,7124.673828,17137029730,-225.689207,-245.420902,0.000000,-145.010742,31.051053,7573.563493,7463.024593,1
2019-12-15,7124.239746,7181.075684,6924.375977,7152.301758,7152.301758,16881129804,-223.838951,-241.104512,27.627930,0.000000,37.126312,7528.907145,7433.431942,0
2019-12-16,7153.663086,7171.168945,6903.682617,6932.480469,6932.480469,20213265950,-237.374051,-240.358420,0.000000,-219.821289,33.409660,7474.964469,7385.722278,0


Remove first 29 days of data to clear the NaN values

In [17]:
df = df[29:]

Show the data

In [18]:
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MACD,signal_Line,up,down,RSI,SMA,EMA,Target
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2019-07-02,10588.683594,10912.188477,9737.884766,10801.677734,10801.677734,31015895223,743.974915,792.209713,218.542969,0.000000,59.517901,9551.828109,10479.807470,1
2019-07-03,10818.156250,11968.078125,10818.156250,11961.269531,11961.269531,30796494294,770.926820,787.953134,1159.591797,0.000000,63.434720,9676.903923,10620.899095,0
2019-07-04,11972.718750,12006.075195,11166.569336,11215.437500,11215.437500,25920294033,723.760921,775.114691,0.000000,-745.832031,58.043104,9793.826139,10677.521800,0
2019-07-05,11203.102539,11395.661133,10874.964844,10978.459961,10978.459961,23838480210,659.655402,752.022833,0.000000,-236.977539,54.122207,9898.967090,10706.182578,1
2019-07-06,10982.543945,11620.964844,10982.543945,11208.550781,11208.550781,21092024306,620.267664,725.671799,230.090820,0.000000,52.589261,10011.851335,10754.027168,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-12-13,7244.662109,7293.560547,7227.122559,7269.684570,7269.684570,17125736940,-221.955029,-250.353826,26.550293,0.000000,29.865603,7626.344189,7498.640464,0
2019-12-14,7268.902832,7308.836426,7097.208984,7124.673828,7124.673828,17137029730,-225.689207,-245.420902,0.000000,-145.010742,31.051053,7573.563493,7463.024593,1
2019-12-15,7124.239746,7181.075684,6924.375977,7152.301758,7152.301758,16881129804,-223.838951,-241.104512,27.627930,0.000000,37.126312,7528.907145,7433.431942,0
2019-12-16,7153.663086,7171.168945,6903.682617,6932.480469,6932.480469,20213265950,-237.374051,-240.358420,0.000000,-219.821289,33.409660,7474.964469,7385.722278,0


## Prepare Data for the Model

Split the data into a feature or independent data et (X) and a Target or dependent data set (Y)

In [19]:
keep_columns = ['Close', 'MACD', 'signal_Line', 'RSI', 'SMA', 'EMA']
X = df[keep_columns].values
Y = df['Target'].values

Split the data into train (80%) and test (20%) data sets

In [20]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)

## Create and train the model

Decision Tree Classifier Model

In [21]:
tree = DecisionTreeClassifier().fit(X_train, Y_train)

Check how well the model did on the training data set

In [22]:
tree.score(X_train, Y_train)

1.0

Check how well the model did on the test data set

In [23]:
tree.score(X_test, Y_test)

0.47058823529411764

Show the model tree predictions

In [24]:
tree_predictions = tree.predict(X_test)

In [25]:
tree_predictions

array([1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0])

Show the actual values from the test data

In [26]:
Y_test

array([0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1,
       0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1])

## Get Model Metrics

In [27]:
from sklearn.metrics import classification_report

In [28]:
print(classification_report(Y_test, tree_predictions))

              precision    recall  f1-score   support

           0       0.50      0.50      0.50        18
           1       0.44      0.44      0.44        16

    accuracy                           0.47        34
   macro avg       0.47      0.47      0.47        34
weighted avg       0.47      0.47      0.47        34



**This is worse than simple guessing**