# Predicting Tomorrow's SP500 Close with XGBoost

In this experiment, I will be attempting to build a model that accurately predicts the closing price of the SP500 for tomorrow. This means that you can run the model after the market closes today and have an understanding as to what to expect for the following day.

I will be using the XGBoost Machine Learning Algorithm to train market data that spans back to 1992 for the open, high, low, close, adj close of the following assets:

SP500
VIX Index
5 yr Treasury Note
10 yr Treasury Bond
30 yr Treasury Bond
I will be diagnosing the performance of the model with Cross Validation and predominantly considering the Root Mean Squared Error as my metric of measurement (rmse).

Procedure:

1. Retreive Data
2. Clean and Format Data for Machine Learning
3. Fit Model
4. Measure Error Metrics with Cross Validation rmse

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import xgboost as xgb
from sklearn.metrics import mean_squared_error
color_pal = sns.color_palette()
plt.style.use('fivethirtyeight')

In [3]:
df = pd.read_csv('clean_historical_data.csv')

In [4]:
df['ds'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S%z').apply(lambda x: x.strftime('%Y-%m-%d'))

In [5]:
df['ds'] = pd.to_datetime(df['ds'], format='%Y-%m-%d')

In [6]:
df.Volume_GSPC = df.Volume_GSPC.astype(float)

# Target Defining: 

We want to predict Tomorrow's closing price of the SPX, so we need to have each observation have tomorrow's price to learn from

In [7]:
df['y'] = df['Adj Close_GSPC'].shift(-1)

In [8]:
df = df.drop(df.index[-1])

In [9]:
df.apply(pd.isnull).sum()

Date              0
Adj Close_FVX     0
Adj Close_GSPC    0
Adj Close_TNX     0
Adj Close_TYX     0
Adj Close_VIX     0
Close_FVX         0
Close_GSPC        0
Close_TNX         0
Close_TYX         0
Close_VIX         0
High_FVX          0
High_GSPC         0
High_TNX          0
High_TYX          0
High_VIX          0
Low_FVX           0
Low_GSPC          0
Low_TNX           0
Low_TYX           0
Low_VIX           0
Open_FVX          0
Open_GSPC         0
Open_TNX          0
Open_TYX          0
Open_VIX          0
Volume_GSPC       0
ds                0
y                 0
dtype: int64

# Time Series Cross Validation:

Robust way of measuring success of the model by utilitizing multiple train/tests all together.

### Disclaimer: For standardization, I will use 62 splits as the original Prophet Cross Validation used:

- training periods of 730 days
- testing periods of 365 days
- gaps of 180 days 
- this totalled to 10 splits

Here, we are 

In [10]:
from sklearn.model_selection import TimeSeriesSplit

In [33]:
tss = TimeSeriesSplit(n_splits=38, test_size=37, gap=180)
df = df.sort_index()

In [34]:
df.index = df.ds

In [35]:
df.columns

Index(['Date', 'Adj Close_FVX', 'Adj Close_GSPC', 'Adj Close_TNX',
       'Adj Close_TYX', 'Adj Close_VIX', 'Close_FVX', 'Close_GSPC',
       'Close_TNX', 'Close_TYX', 'Close_VIX', 'High_FVX', 'High_GSPC',
       'High_TNX', 'High_TYX', 'High_VIX', 'Low_FVX', 'Low_GSPC', 'Low_TNX',
       'Low_TYX', 'Low_VIX', 'Open_FVX', 'Open_GSPC', 'Open_TNX', 'Open_TYX',
       'Open_VIX', 'Volume_GSPC', 'ds', 'y'],
      dtype='object')

In [36]:
fold = 0
preds = []
scores = []
for train_idx, val_idx in tss.split(df):
    train = df.iloc[train_idx]
    test = df.iloc[val_idx]


    FEATURES = ['Adj Close_FVX', 'Adj Close_GSPC', 'Adj Close_TNX',
       'Adj Close_TYX', 'Adj Close_VIX', 'Close_FVX', 'Close_GSPC',
       'Close_TNX', 'Close_TYX', 'Close_VIX', 'High_FVX', 'High_GSPC',
       'High_TNX', 'High_TYX', 'High_VIX', 'Low_FVX', 'Low_GSPC', 'Low_TNX',
       'Low_TYX', 'Low_VIX', 'Open_FVX', 'Open_GSPC', 'Open_TNX', 'Open_TYX',
       'Open_VIX', 'Volume_GSPC']
    TARGET = 'y'

    X_train = train[FEATURES]
    y_train = train[TARGET]

    X_test = test[FEATURES]
    y_test = test[TARGET]

    reg = xgb.XGBRegressor(base_score=0.5, booster='gbtree',    
                           n_estimators=1000,
                           early_stopping_rounds=50,
                           objective='reg:linear',
                           max_depth=3,
                           learning_rate=0.01)
    reg.fit(X_train, y_train,
            eval_set=[(X_train, y_train), (X_test, y_test)],
            verbose=100)

    y_pred = reg.predict(X_test)
    preds.append(y_pred)
    score = np.sqrt(mean_squared_error(y_test, y_pred))
    scores.append(score)

[0]	validation_0-rmse:1200.55913	validation_1-rmse:2528.39148
[100]	validation_0-rmse:441.91619	validation_1-rmse:1215.29627
[200]	validation_0-rmse:163.18610	validation_1-rmse:700.45377
[300]	validation_0-rmse:61.29240	validation_1-rmse:483.29091
[400]	validation_0-rmse:25.26848	validation_1-rmse:376.14078
[500]	validation_0-rmse:14.47845	validation_1-rmse:328.99382
[600]	validation_0-rmse:12.14810	validation_1-rmse:309.63080
[700]	validation_0-rmse:11.64587	validation_1-rmse:304.88310
[800]	validation_0-rmse:11.46284	validation_1-rmse:303.00949
[900]	validation_0-rmse:11.35578	validation_1-rmse:301.95830
[999]	validation_0-rmse:11.26635	validation_1-rmse:301.30342
[0]	validation_0-rmse:1208.94996	validation_1-rmse:2617.87149
[100]	validation_0-rmse:445.09556	validation_1-rmse:1278.85783
[200]	validation_0-rmse:164.43302	validation_1-rmse:735.81981
[300]	validation_0-rmse:61.76279	validation_1-rmse:508.30918
[400]	validation_0-rmse:25.44313	validation_1-rmse:398.67201
[500]	validation

In [37]:
print(f'Average score across folds: {np.mean(scores):0.4f}')
print(f'Fold scores:{scores}')

Average score across folds: 289.7991
Fold scores:[301.3034161897235, 297.78150556061775, 390.163057963001, 281.7444984956114, 252.32805994280477, 296.33012037181396, 309.4045851154075, 94.18136786254182, 45.71531834362878, 19.484942051726406, 65.6047084722639, 60.172055176811305, 81.96837894196496, 70.60831703221987, 170.45083157934383, 332.22850301268284, 316.36515814340777, 77.50219939828962, 116.53709413092515, 301.4694486324399, 287.15404190998333, 259.94441952040734, 481.5077978058535, 615.947341763415, 830.3291874722062, 811.0039432824394, 974.3822834006806, 795.9176348862125, 814.9632963085899, 563.7202106366569, 257.1811065901165, 83.63640770532615, 68.49483181874585, 64.30455849549166, 59.79563835384092, 65.54191557204474, 52.058630080185644, 45.14018184518335]


# Conclusion:

Lets compare our results between the XGBoost model and Prophet. Before doing that, these are the parameters that we input to each Cross Validation package:

Splits: 10
Training Length: 730 days
Testing Length: 37 Days
Gap: 180 Days

Here are the results for the testing lengths of 37 days:

XGBoost rmse: 289
Prophet rmse: 16

Shortcomings: We must notice that though we have standardized the inputs to each of the cross validation models, we did use two different Cross Validation models. Nevertheless, it is safe to assume that the Prophet model performed better since not even the best fold from XGBoost reached the average prophet model rmse.

# Next Steps:

1. Remove Outliers
2. Create Lag predictors
3. Create hyper-predictors
4. Add more assets