# Stock Arima

In [37]:
%matplotlib inline
import pandas as pd
df = pd.read_csv("stock_data.csv")
df = df[['DATE','Adj Close']]

Right now our index is actually just a list of strings that look like a date, we'll want to adjust these to be timestamps, that way our forecasting analysis will be able to interpret these values:

In [38]:
df['DATE'] = pd.to_datetime(df['DATE'])
df = df.set_index(['DATE'], drop=True)

In [39]:
df.head(10)

Unnamed: 0_level_0,Adj Close
DATE,Unnamed: 1_level_1
2015-01-02,101.53
2015-01-05,98.67
2015-01-06,98.68
2015-01-07,100.06
2015-01-08,103.91
2015-01-09,104.02
2015-01-12,101.45
2015-01-13,102.35
2015-01-14,101.96
2015-01-15,99.2


In [40]:
data=df

Let's first make sure that the data doesn't have any missing data points:

Let's also rename this column since its hard to remember what "IPG2211A2N" code stands for:

In [41]:
data[pd.isnull(data['Adj Close'])]

Unnamed: 0_level_0,Adj Close
DATE,Unnamed: 1_level_1


In [42]:
data.head(10)

Unnamed: 0_level_0,Adj Close
DATE,Unnamed: 1_level_1
2015-01-02,101.53
2015-01-05,98.67
2015-01-06,98.68
2015-01-07,100.06
2015-01-08,103.91
2015-01-09,104.02
2015-01-12,101.45
2015-01-13,102.35
2015-01-14,101.96
2015-01-15,99.2


In [43]:
import plotly 
plotly.tools.set_credentials_file(username='MartinKostov', api_key='MJuWa5DCyFHlwHPWmSqM')# plotly.tools.set_credentials_file()

In [44]:
from plotly.plotly import plot_mpl
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(data, model='multiplicative', freq=1)
fig = result.plot()
plot_mpl(fig)

'https://plot.ly/~MartinKostov/34'

In [45]:
import plotly.plotly as ply
import cufflinks as cf
# Check the docs on setting up offline plotting

In [46]:
data.iplot(title="Stock Test", theme='pearl')


Consider using IPython.display.IFrame instead



In [47]:
from pyramid.arima import auto_arima

**he AIC measures how well a model fits the data while taking into account the overall complexity of the model. A model that fits the data very well while using lots of features will be assigned a larger AIC score than a model that uses fewer features to achieve the same goodness-of-fit. Therefore, we are interested in finding the model that yields the lowest AIC value.

In [53]:
stepwise_model = auto_arima(data, start_p=1, start_q=1,
                           max_p=3, max_q=3, m=12,
                           start_P=0, seasonal=True,
                           d=1, D=1, trace=True,kg
                           error_action='ignore',  
                           suppress_warnings=True, 
                           stepwise=True) 

SyntaxError: invalid syntax (<ipython-input-53-59945a9157f3>, line 5)

In [49]:
stepwise_model.aic()

3451.9921800582356

## Train Test Split

In [50]:
data.head()

Unnamed: 0_level_0,Adj Close
DATE,Unnamed: 1_level_1
2015-01-02,101.53
2015-01-05,98.67
2015-01-06,98.68
2015-01-07,100.06
2015-01-08,103.91


In [51]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1091 entries, 2015-01-02 to 2019-05-03
Data columns (total 1 columns):
Adj Close    1091 non-null float64
dtypes: float64(1)
memory usage: 17.0 KB


We'll train on 20 years of data, from the years 1985-2015 and test our forcast on the years after that and compare it to the real data.

In [52]:
train = data.loc['2015-01-01':'2018-05-31']

In [23]:
train.tail()

Unnamed: 0_level_0,Adj Close
DATE,Unnamed: 1_level_1
2018-05-24,186.04
2018-05-25,186.47
2018-05-29,185.8
2018-05-30,185.4
2018-05-31,184.78


In [24]:
test = data.loc['2018-06-01':]

In [25]:
test.head()

Unnamed: 0_level_0,Adj Close
DATE,Unnamed: 1_level_1
2018-06-01,188.11
2018-06-04,189.68
2018-06-05,191.14
2018-06-06,191.81
2018-06-07,191.29


In [26]:
test.tail()

Unnamed: 0_level_0,Adj Close
DATE,Unnamed: 1_level_1
2019-04-29,204.61
2019-04-30,200.67
2019-05-01,210.52
2019-05-02,209.15
2019-05-03,211.75


In [27]:
len(test)

232

In [28]:
stepwise_model.fit(train)

ARIMA(callback=None, disp=0, maxiter=50, method=None, order=(0, 1, 0),
   out_of_sample_size=0, scoring='mse', scoring_args={},
   seasonal_order=None, solver='lbfgs', start_params=None,

In [29]:
future_forecast = stepwise_model.predict(n_periods=232)

In [30]:
future_forecast

array([184.87702797, 184.97405594, 185.07108392, 185.16811189,
       185.26513986, 185.36216783, 185.4591958 , 185.55622378,
       185.65325175, 185.75027972, 185.84730769, 185.94433566,
       186.04136364, 186.13839161, 186.23541958, 186.33244755,
       186.42947552, 186.5265035 , 186.62353147, 186.72055944,
       186.81758741, 186.91461538, 187.01164336, 187.10867133,
       187.2056993 , 187.30272727, 187.39975524, 187.49678322,
       187.59381119, 187.69083916, 187.78786713, 187.8848951 ,
       187.98192308, 188.07895105, 188.17597902, 188.27300699,
       188.37003497, 188.46706294, 188.56409091, 188.66111888,
       188.75814685, 188.85517483, 188.9522028 , 189.04923077,
       189.14625874, 189.24328671, 189.34031469, 189.43734266,
       189.53437063, 189.6313986 , 189.72842657, 189.82545455,
       189.92248252, 190.01951049, 190.11653846, 190.21356643,
       190.31059441, 190.40762238, 190.50465035, 190.60167832,
       190.69870629, 190.79573427, 190.89276224, 190.98

In [31]:
future_forecast = pd.DataFrame(future_forecast,index = test.index,columns=['Prediction'])

In [32]:
future_forecast.head()

Unnamed: 0_level_0,Prediction
DATE,Unnamed: 1_level_1
2018-06-01,184.877028
2018-06-04,184.974056
2018-06-05,185.071084
2018-06-06,185.168112
2018-06-07,185.26514


In [33]:
test.head()

Unnamed: 0_level_0,Adj Close
DATE,Unnamed: 1_level_1
2018-06-01,188.11
2018-06-04,189.68
2018-06-05,191.14
2018-06-06,191.81
2018-06-07,191.29


In [34]:
pd.concat([test,future_forecast],axis=1).iplot()


Consider using IPython.display.IFrame instead



In [35]:
future_forecast2 = future_forecast

In [36]:
pd.concat([data,future_forecast2],axis=1).iplot()


Consider using IPython.display.IFrame instead

