Description
I would like to perform nested resampling with a GraphLearner that uses XGBoost as its base learner. Additionally, I aim to optimize early stopping through internal hyperparameter tuning. While this works as expected with a standalone XGBoost learner, it fails when using the GraphLearner.
What would be a suitable approach to enable internal tuning of early stopping within a GraphLearner in mlr3?
Below is a minimal reproducible example:
`
task = as_task_classif(iris, target = 'Species')
po_pca = po("pca")
lrn_xgb = lrn('classif.xgboost')
lrn_xgb$param_set$set_values(
eta = to_tune(0.001, 0.1, logscale = TRUE),
nrounds = to_tune(upper = 500, internal = TRUE),
early_stopping_rounds = 10,
eval_metric = "mlogloss"
)
lrn_xgb$validate = "test"
graph_learner = as_learner(po_pca %>>% lrn_xgb)
inner_fold = rsmp('cv', folds = 2)
outer_fold = rsmp('cv', folds = 3)
at = auto_tuner(
tuner = tnr("grid_search"),
learner = graph_learner,
resampling = rsmp("cv", folds = 2),
measure = msr("internal_valid_score",
select = "mlogloss", minimize = TRUE),
term_evals = 10L,
store_benchmark_result = TRUE,
store_models = TRUE,
store_tuning_instance = TRUE
)
design = benchmark_grid(
task = task,
learners = at,
resampling = outer_fold
)
bmr = benchmark(design, store_models = TRUE)
`
And the according error output:
Fehler: Validate field of PipeOp 'classif.xgboost' must either be NULL or 'predefined'. To configure how the validation data is created, set the $validate field of the GraphLearner, e.g. using set_validate().