<a href="https://colab.research.google.com/github/yankikalfa/SAIS-ML-for-Finance/blob/main/Automatic_CV_and_Evaluation_In_Class_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#In-Class Assignment: Automatic CV and Evaluation 

The purpose of this assignment is to create a complete forecasting pipeline using Linear and Non-Linear ML methods. You will write two functions that recursively forecasts excess returns and keep track of coefficients and optimal hyperparameters.

The first function will include linear ML methods, the second function needs to include tree-based methods.

## Objectives:
* Write a function that forecasts excess returns using automatic time series cross validation for ELastic Net.
* Write a function that forecasts excess returns using automatic time series cross validation for Random Forests with Randomized Search Cross Validation.
* Recursively forecast Excess Returns for the past 10 years
  * Using Elastic Net
  * Using Random Forest
* Use 5 fold time series cross validation to tune hyperparameters
* Chart hyperparameters selected at each point
* Chart coefficients selected at each point
* Compare your forecasts agains the prevailing mean benchmark
* Run Dibeold Mariano tests
* Chart Out of Sample $R^2
* Chart Cumulative change in SSE 

## Hyperparameters:
### Elastic Net
* $\alpha \in \{0.00001,0.1\}$
* Number of $\alpha$ = 1000
* $\rho=0.5$

### Random Forest
* Criterion: Mean Squared Error
* Number of parallel trees: [100,300,500]
* Max Depth: [5,10,20]
* Max features at nodes: [sqrt, 0.3] 





# Libraries and Data

In [None]:
! pip install statsmodels -U

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting statsmodels
  Downloading statsmodels-0.13.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.8 MB)
[K     |████████████████████████████████| 9.8 MB 5.3 MB/s 
Installing collected packages: statsmodels
  Attempting uninstall: statsmodels
    Found existing installation: statsmodels 0.10.2
    Uninstalling statsmodels-0.10.2:
      Successfully uninstalled statsmodels-0.10.2
Successfully installed statsmodels-0.13.2


In [None]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
from sklearn.linear_model import LassoCV, ElasticNetCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import TimeSeriesSplit, RandomizedSearchCV
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import r2_score, mean_squared_error
import warnings
warnings.filterwarnings("ignore")
plt.style.use('bmh')
plt.rcParams["figure.figsize"] = (16,8)

In [None]:
df = pd.read_csv('GWdata.csv')
df['yyyymm'] = pd.to_datetime(df['yyyymm'],format='%Y%m', errors='coerce')
df.set_index('yyyymm',inplace=True)
df['er'] = df['CRSP_SPvw'] - df['Rfree']
df.drop(['CRSP_SPvw','Rfree','Index'],axis=1,inplace=True)
Y = df.loc['1927-01-01':,'er'].reset_index(drop=True)
X = df.loc[:'2021-11-01', df.columns!='er'].reset_index(drop=True)

# Forecasting Functions

In [None]:
#Your code  here

# Model Evaluation

## Elastic Net

### Graph of Lambda

In [None]:
# Your code here

### Graph of Coefficients with Lambda

In [None]:
# Your code here

In [None]:
# Your code here

## Random Forests

In [None]:
# Your code here

### Graph of Hyperparameters for Random Forest

In [None]:
# Your code here

### Feature Importance Graphs for Random Forests

In [None]:
# Your code here

### What are the 3 most important features according to the Random Forest?


Answer here:

# Benchmarking

In [None]:
# Your code here

# Forecast Evaluation

In [None]:
# Your code here

## Forecast Comparison Graph

In [None]:
# Your code here

## Forecast Error Comparison

In [None]:
# Your code here

## Diebold Mariano Test

In [None]:
# Your code here

### Does the ENET beat the Prevailing Mean?

Answer here:

In [None]:
# Your code here

### Does the Random Forest beat the Prevailing Mean?

Answer here:



## OoS $R^2$

Compute out of sample $R^2$ values for Elastic Net and Random Forest:

In [None]:
# Your code here

In [None]:
# Your code here

Comment on the OoS$R^2$ values of both Random Forest and Elastic Net. Compare your results to the DM test

Answer here:


## Cumulative $\Delta$ SSE 

In [None]:
# Your code here

In [None]:
# Your code here

In [None]:
# Your code here