### Stage 1

Run multi-step feature selection (using RFE) and score the model on non-COVID external test set. For checking the code for the ensemble model code see experiment 1 in `[Stage 2] experiments.ipynb`.

In [None]:
import util
import model
import config

### Load non-COVID data (group 1)

In [None]:
data, x, y, xgb_data = util.load_data(config.NONCOVID_XGB_TRAIN_DATA_LOC)

### Grid search CV

Multi-stage grid search:
1. Generate random CV folds.
2. Use xgb.cv to find initial optimal # of boosting rounds for XGB (using default parameters)
3. Using optimal # of trees from the previous step, run a Grid Search CV on the following parameters:
   - max_depth and min_child_weight
   - gamma
   - subsample and colsample_bytree
4. Run a 2nd, more sensitive Grid Search CV on the best parameters from the previous step
5. Fit a model on all data using the best parameters
6. Run Recursive Feature Elimination (RFE) using the model from (5).

Repeat (1)-(6) in three stages to find the optimal subset of features.

In [None]:
# Stage 1

# Generate CV folds
cv_folds = util.generate_cv_folds(data, util.CV_N_FOLDS, util.CV_N_REPEATS, random_state=31)

# Run XGB grid search
booster, best_params = model.run_xgb_grid_search(xgb_data, x, y, cv_folds)

# Run recursive feature elimination
x_reduced_1, xgb_data_reduced_1 = model.recursive_feature_selection(booster, x, y, cv_folds, step=1, force_select_features=config.FEATURES_FORCE_SELECT_ALL)


In [None]:
# Stage 2

# Generate CV folds
cv_folds = util.generate_cv_folds(data, util.CV_N_FOLDS, util.CV_N_REPEATS, random_state=32)

# Run XGB grid search
booster, best_params = model.run_xgb_grid_search(xgb_data_reduced_1, x_reduced_1, y, cv_folds)

# Run recursive feature elimination
x_reduced_2, xgb_data_reduced_2 = model.recursive_feature_selection(booster, x_reduced_1, y, cv_folds, step=1, force_select_features=config.FEATURES_FORCE_SELECT_ALL)


In [None]:
# Stage 3

# Generate CV folds
cv_folds = util.generate_cv_folds(data, util.CV_N_FOLDS, util.CV_N_REPEATS, random_state=32)

# Run XGB grid search
booster, best_params = model.run_xgb_grid_search(xgb_data_reduced_2, x_reduced_2, y, cv_folds)

# Run recursive feature elimination
x_reduced_3, xgb_data_reduced_3 = model.recursive_feature_selection(booster, x_reduced_2, y, cv_folds, step=1, force_select_features=config.FEATURES_FORCE_SELECT_ALL)


In [None]:
# Final fitting

# Generate CV folds
cv_folds = util.generate_cv_folds(data, util.CV_N_FOLDS, util.CV_N_REPEATS, random_state=33)

# Run XGB grid search and report score
booster, best_params = model.run_xgb_grid_search(xgb_data_reduced_3, x_reduced_3, y, cv_folds)
