# Stock Price prediction using Machine Learning

In this tutorial, you explore developing regression machine learning models using Python.
The data set that will be used is the 5 year daily stock price for Apple Inc (https://finance.yahoo.com/quote/AAPL/history?p=AAPL). 
This data set is meant to be used to predict next day Apple stock price. 

## In this notebook

 - Find the API Docs for the running version of Pandas & scikit-learn
 - Prepare time series data
 - Train regression models
 - Evaluate regression models
 - Save and load trained models
 
 Notebook adapted from: https://www.analyticsvidhya.com/blog/2018/10/predicting-stock-price-machine-learningnd-deep-learning-techniques-python/


In [None]:
!pip install pandas

In [None]:
import pandas as pd
import numpy as np

In [None]:
# Load the data from the CSV to a Panda's Dataframe

filePath = 'AAPL_5years.csv'
 
df=pd.read_csv(filePath)

In [None]:
#display first 10 records

df.head(n=10)

In [None]:
#display last 10 records

df.tail(n=10)

In [None]:
#Let's check the data schema of this dataframe, i.e. the data type for each column?

df.info()

In [None]:
#Let's calculate statistics for the numeric attributes

df.describe()

There are multiple variables in the dataset – date, open, high, low, last, close, total_trade_quantity, and turnover.

- The columns Open and Close represent the starting and final price at which the stock is traded on a particular day.
- High, Low and Last represent the maximum, minimum, and last price of the share for the day.
- Total Trade Quantity is the number of shares bought or sold in the day and Turnover (Lacs) is the turnover of the particular company on a given date.

The profit or loss calculation is usually determined by the closing price of a stock for the day, hence we will consider the __closing price__ as the __target__ variable.

In [None]:
df.drop(['Open', 'High','Low','Adj Close','Volume'], axis=1,inplace=True)

df


In [None]:
!pip install plotly

import plotly.graph_objects as go


In [None]:
fig=go.Figure()

#setting index as date
df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d')

fig.add_trace(go.Scatter(x=df['Date'], y=df['Close'],
                    mode='lines',
                    name='Close Price history'))

fig.update_layout(title="Close Price history",
                 xaxis_title="Date",yaxis_title="Close Price",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

In [None]:
# Use the Sliding Window algorithm to reshape the data. 

# Window of size 10 days and a slide of 1 day (i.e use the previous 10 values to predict the next value, then slides 1 day)

temps = pd.DataFrame(df['Close'].values)
flattenDF = pd.concat([temps.shift(10),temps.shift(9),temps.shift(8),temps.shift(7),temps.shift(6),temps.shift(5),temps.shift(4),temps.shift(3),temps.shift(2),temps.shift(1), temps], axis=1)
flattenDF.columns = ['t-10','t-9','t-8','t-7','t-6','t-5','t-4','t-3','t-2','t-1', 't']

flattenDF

In [None]:
# Check the last flattended record against the last 11 Close values. They should be the same.

df.tail(n=11)

In [None]:
#Drop records with NaNs

flattenDF.dropna(inplace=True)

flattenDF

# Build a regression model to predict the value using the previous 10 days data



In [None]:
!pip install scikit-learn

In [None]:

#assign features times t-10 to t-1 columns to a features DF 
features_df = flattenDF.loc[:, flattenDF.columns != 't']

#Set target to the time t column
target = flattenDF['t']


#split data set into train and validation sets. We are not using the random spit function because we want to use old data to predict newer data
X_train = features_df[:999]
X_val = features_df[999:]
y_train = target[:999]
y_val = target[999:]


print ("Train dataset: {0}{1}".format(X_train.shape, y_train.shape))
print ("Validation dataset: {0}{1}".format(X_val.shape, y_val.shape))

In [None]:
# Train a linear regression model

from sklearn.linear_model import LinearRegression

linearRegModel = LinearRegression()
linearRegModel.fit(X_train,y_train)

In [None]:
# make predictions on the validation data to evaluate the model
linearRegPreds = linearRegModel.predict(X_val)
linearRegPreds.shape

In [None]:
#Append the 250 predictions to 1009 zeros representing the training and NaN records to generate a series of 1259 records
# to add to the original data dataframe to be able to display and compare the predictions with actual values.

df["Predictions"] = np.append(np.zeros(1009),linearRegPreds)

df

In [None]:
#Display the actual VS Predicted values

fig=go.Figure()

#setting index as date
df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d')

fig.add_trace(go.Scatter(x=df['Date'], y=df['Close'],
                    mode='lines',
                    name='Close Price history'))

fig.add_trace(go.Scatter(x=df['Date'], y=df['Predictions'],
                    mode='lines',
                    name='Predicted Close Price'))

fig.update_layout(title="Close Price",
                 xaxis_title="Date",yaxis_title="Close Price",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

In [None]:
# Evaluate the model by calculating the prediction root mean square error (RMS)

import numpy

rms=np.sqrt(np.mean(np.power((np.array(y_val)-np.array(linearRegPreds)),2)))
rms

In [None]:
#Check the other models you can build

from sklearn.kernel_ridge import KernelRidge #SVM with Kernel to model non-linear data https://scikit-learn.org/stable/modules/generated/sklearn.kernel_ridge.KernelRidge.html#sklearn.kernel_ridge.KernelRidge

from sklearn import svm #linear SVM https://scikit-learn.org/stable/modules/svm.html#regression

from sklearn.neighbors import NearestNeighbors #KNN https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbors-regression

from sklearn import tree # Decision Tree Regressor https://scikit-learn.org/stable/modules/tree.html#regression

# Saving the trained model

In [None]:
#import the pickle library
import pickle

In [None]:
# save the model to disk
filename = 'stocksLinerRegModel.mdl'
pickle.dump(linearRegModel, open(filename, 'wb'))
 

# Load the model later

In [None]:
# load the model from disk
filename = 'stocksLinerRegModel.mdl'
loaded_model = pickle.load(open(filename, 'rb'))

In [None]:
# Predict Tomorrow's price based on the last 10 days data

last_10_days_prices = df['Close'].tail(n=10)
tomorrows_stock_price = loaded_model.predict([last_10_days_prices])

tomorrows_stock_price

# What do we need to do different to pridect the stock price the day after tomorrow or the prices for the entire upcoming week?