# ARIMA grid search 

Grid search ARIMA parameters for time series<br><br>
This is a link to the parts of code that I used: <a href="https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/">machinelearningmastery.com</a>

#### Import libraries

In [2]:
#from libraries import *
%run libraries.py

In [3]:
ts = load_data()
ts

Unnamed: 0_level_0,close
timestamp,Unnamed: 1_level_1
2020-07-13,9200.00
2020-07-14,9116.00
2020-07-15,9156.00
2020-07-16,9044.02
2020-07-17,9087.00
...,...
2021-11-20,57372.00
2021-11-21,58518.88
2021-11-22,55600.00
2021-11-23,55250.00


`p` is the order of the ‘Auto Regressive’ (AR) term. It refers to the number of lags of Y to be used as predictors.<br>
`d` is the number of differencing required to make the time series stationary<br>
`q` is the order of the ‘Moving Average’ (MA) term. It refers to the number of lagged forecast errors that should go into the ARIMA Model.

In [4]:
# evaluate an ARIMA model for a given order (p,d,q)
def evaluate_arima_model(X, arima_order):
	# prepare training dataset
	train_size = int(len(X) * 0.66)
	train, test = X[0:train_size], X[train_size:]
	history = [x for x in train]
	# make predictions
	predictions = list()
	for t in range(len(test)):
		model = ARIMA(history, order=arima_order)
		model_fit = model.fit()
		yhat = model_fit.forecast()[0]
		predictions.append(yhat)
		history.append(test[t])
	# calculate out of sample error
	rmse = sqrt(mean_squared_error(test, predictions))
	return rmse

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
	dataset = dataset.astype('float32')
	best_score, best_cfg = float("inf"), None
	for p in p_values:
		for d in d_values:
			for q in q_values:
				order = (p,d,q)
				try:
					rmse = evaluate_arima_model(dataset, order)
					if rmse < best_score:
						best_score, best_cfg = rmse, order
					print('ARIMA%s RMSE=%.3f' % (order,rmse))
				except:
					continue
	print('Best ARIMA%s RMSE=%.3f' % (best_cfg, best_score))

#### Evaluate models with parameters in range 0-3

In [5]:
p_values = range(0, 3)
d_values = range(0, 3)
q_values = range(0, 3)
evaluate_models(ts.values, p_values, d_values, q_values)

ARIMA(0, 0, 0) RMSE=17035.704
ARIMA(0, 0, 1) RMSE=9317.318
ARIMA(0, 0, 2) RMSE=6481.092
ARIMA(0, 1, 0) RMSE=1688.545
ARIMA(0, 1, 1) RMSE=1696.007
ARIMA(0, 1, 2) RMSE=1695.224
ARIMA(0, 2, 0) RMSE=2395.226
ARIMA(0, 2, 1) RMSE=1684.978
ARIMA(0, 2, 2) RMSE=1702.462
ARIMA(1, 0, 0) RMSE=1693.760
ARIMA(1, 0, 1) RMSE=1701.168
ARIMA(1, 0, 2) RMSE=1701.120
ARIMA(1, 1, 0) RMSE=1697.010
ARIMA(1, 1, 1) RMSE=1695.053
ARIMA(1, 1, 2) RMSE=1698.356
ARIMA(1, 2, 0) RMSE=2147.467
ARIMA(1, 2, 1) RMSE=1694.500
ARIMA(1, 2, 2) RMSE=1703.231
ARIMA(2, 0, 0) RMSE=1701.990
ARIMA(2, 0, 1) RMSE=1701.362
ARIMA(2, 0, 2) RMSE=1707.447
ARIMA(2, 1, 0) RMSE=1696.038
ARIMA(2, 1, 1) RMSE=1695.488
ARIMA(2, 1, 2) RMSE=1698.798
ARIMA(2, 2, 0) RMSE=2025.422
ARIMA(2, 2, 1) RMSE=1688.074
ARIMA(2, 2, 2) RMSE=1700.197
Best ARIMA(0, 2, 1) RMSE=1684.978


#### Try model with parameters Moving Average = 1 and number of differencing =  100 epoch

In [6]:
evaluate_models(ts.values, [0], [100], [1])

ARIMA(0, 100, 1) RMSE=5842013984032947214741933920508634151857724761929519375461646336.000
Best ARIMA(0, 100, 1) RMSE=5842013984032947214741933920508634151857724761929519375461646336.000


#### Try another grid search 

In [6]:
p_values = [1]
d_values = [10, 30, 50, 70, 90]
q_values = [0]
evaluate_models(ts.values, p_values, d_values, q_values)

ARIMA(1, 10, 0) RMSE=215909.057
ARIMA(1, 30, 0) RMSE=12661235518727.168
ARIMA(1, 50, 0) RMSE=3438293265844003328.000
ARIMA(1, 70, 0) RMSE=2672301613192388963218039127408640.000
Best ARIMA(1, 10, 0) RMSE=215909.057


### Still best model is with AR = 0, MA = 1, and number of differencing = 2
## ARIMA(0, 2, 1)