Skip to content

Although mtry finalized tune_bayes does not work (tune_grid does). #432

@jacekkotowski

Description

@jacekkotowski

The problem

I'm having trouble with tune_bayes() tuning xgboost parameters. Without tuning mtry the function works. After mtry is added to the parameter list and then finalized I can tune with tune_grid and random parameter selection without problems. tune_bayes throws an error.

Reproducible example

doParallel::registerDoParallel()

xgboost_set <-
  parameters(bike_rf_wkfl) %>%
  update(mtry = finalize(mtry(), bike_training)) 
#> Error in parameters(bike_rf_wkfl) %>% update(mtry = finalize(mtry(), bike_training)): could not find function "%>%"
##  or entered by hand 
# xgboost_set <-
#   parameters(bike_rf_wkfl) %>%
#   update(mtry = mtry(c(2,8)))


# this will work
bike_rf_initial <-
  bike_rf_wkfl %>%
  tune_grid(
    resamples = bike_folds,
    param_info = xgboost_set,
    metrics = bike_metrics,
    grid = 9
  )
#> Error in bike_rf_wkfl %>% tune_grid(resamples = bike_folds, param_info = xgboost_set, : could not find function "%>%"

# this will not work
bike_rf_rs <-
  bike_rf_wkfl %>%
    tune_bayes( initial = 9,                      
    resamples = bike_folds,
    param_info = xgboost_set,
    metrics = metric_set(mape, rsq),
  )
#> Error in bike_rf_wkfl %>% tune_bayes(initial = 9, resamples = bike_folds, : could not find function "%>%"

Created on 2021-11-18 by the reprex package (v2.0.1)

Error message for tune bayes

x Gaussian process model: Error in fit_gp(mean_stats %>% dplyr::select(-.iter), pset = param_info, : argument is missing, with no default
Error in eval(expr, p) : no loop for break/next, jumping to top level
x Optimization stopped prematurely; returning current results.

My workflow looks like

bike_all<-
  read_csv("dane/train.csv", col_types = cols()) %>% 
  select(- casual, - registered)

# Create data split object
bike_split <- 
  initial_time_split(bike_all, 
                prop = .80)

# Create the training data
bike_training <- bike_split %>% 
  training()

# Create the test data
bike_testing <- bike_split %>% 
  testing()

bike_recipe <- recipe(count ~ ., data = bike_training) %>%
  step_mutate(datetime_hr = as.factor(lubridate::hour(datetime))) %>% 
  step_date(datetime, features = c("doy", "dow", "month", "year"), abbr = TRUE) %>%
  step_log(windspeed, base = 10, offset = 1) %>% 
  update_role("datetime", new_role = "id_variable") %>% 
  step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE) %>% 
  {.}

bike_folds <- 
  timetk::time_series_cv(
    bike_training,
    assess = "1 months",
    initial = "11 months",
    skip = "1 months",
    slice_limit = 5, 
    cumulative = TRUE)


rf_model <- boost_tree(
  trees = 500,
  tree_depth = tune(),
  min_n = tune(),
  mtry = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  learn_rate = tune(),
    ) %>%
  set_engine('xgboost',
             objective = 'count:poisson') %>%
  set_mode('regression')

# Create workflow
bike_rf_wkfl <- 
  workflow() %>% 
  # Add model
  add_model(rf_model) %>% 
  # Add recipe
  add_recipe(bike_recipe) 


# Create custom metrics function
bike_metrics <- metric_set(mape, rsq)

Created on 2021-11-18 by the reprex package (v2.0.1)

I am using the dataset about bike rental (attached)
train.csv

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions