Error when using parallel processing for `tune_grid()` and `nnetar_reg()` from `modeltime` package #272

MxNl · 2020-09-03T10:00:38Z

The problem

I am getting the following error when trying parallelized tuning following these instructions:

x id, out_id, in_id, data: internal: Error in rlang::env_get(mod_env, items): Argument "default" fehlt (ohne Standardwert)
All models failed in tune_grid(). See the .notes column.

I have read related issues such as #159, #157, #60, and #59. But proposed solutions didn't help.

Reproducable example

import::from(janitor, clean_names, get_dupes)
library(forecast)
library(timetk)
library(modeltime)
library(tidymodels)
library(tidyverse)

data("data_buoy_gaps", package = "forecastML")

data <- 
  data_buoy_gaps


## Minor Preparations
data <- 
  data %>% 
  select(date, wind_spd, air_temperature, sea_surface_temperature)

## Modelling
train_test_splits <- data %>% 
  initial_time_split(prop = 0.9)

data_train <- 
  train_test_splits %>% 
  training()

data_test <- 
  train_test_splits %>% 
  testing()

## Resampling with `rsample`
n_samples_train <- 
  data_train %>% 
  nrow()

n_initial <- 
  (n_samples_train * 0.5) %>% 
  floor()

n_slices <- 5

n_slice <- 
  ((n_samples_train - n_initial) / n_slices) %>% 
  floor()

resampling_strategy_cv5fold <- 
  data_train %>%
  time_series_cv(
    initial = n_initial,
    assess = n_slice,
    skip = n_slice,
    cumulative = TRUE
  )

# Preprocessing with `recipe`
buoy_gaps_recipe <-
  recipe(wind_spd ~ ., data = data_train) %>% 
  # update_role(date, new_role = "ID") %>%
  step_normalize(all_predictors(), -date)

# Defining a Learner with `parsnip` and `modeltime`
tune_nnetar_model <-
  nnetar_reg(
    seasonal_period = 12,
    non_seasonal_ar = tune(),
    seasonal_ar = tune(),
    hidden_units = tune(),
    num_networks = 20,
    penalty = tune(),
    epochs = tune()
  ) %>%
  set_engine("nnetar", 
             scale.inputs = FALSE) %>%
  set_mode("regression")

# Tuning
n_levels <- 2

tune_grid <- grid_regular(
  non_seasonal_ar(range = c(1L, 5L)),
  seasonal_ar(range = c(1L, 5L)),
  hidden_units(),
  # num_networks(),
  penalty(),
  epochs(),
  levels = n_levels
)

tune_grid %>% 
  nrow()

# Workflow
nnetar_workflow <- 
  workflow() %>% 
  add_model(tune_nnetar_model) %>% 
  add_recipe(buoy_gaps_recipe)

# Parallelize Tuning
library(doParallel)
library(doFuture)
all_cores <- parallel::detectCores(logical = FALSE)

registerDoFuture()
cl <- makeCluster(all_cores)
plan(cluster, workers = cl)

nnetar_resampling <- 
  nnetar_workflow %>% 
  tune_grid(
    resamples = resampling_strategy_cv5fold,
    grid = tune_grid,
    metrics = metric_set(rmse, mae))

I would appreciate any help. Thanks a lot!

The text was updated successfully, but these errors were encountered:

juliasilge · 2020-09-04T03:58:46Z

I don't believe there is a problem with tuning nnetar_reg() with parallel processing. I tried this out on a simpler, smaller dataset here:

library(timetk)
library(modeltime)
library(tidymodels)

bike_transactions_tbl <- bike_sharing_daily %>%
  select(dteday, cnt) %>%
  set_names(c("date", "value"))

bike_transactions_tbl
#> # A tibble: 731 x 2
#>    date       value
#>    <date>     <dbl>
#>  1 2011-01-01   985
#>  2 2011-01-02   801
#>  3 2011-01-03  1349
#>  4 2011-01-04  1562
#>  5 2011-01-05  1600
#>  6 2011-01-06  1606
#>  7 2011-01-07  1510
#>  8 2011-01-08   959
#>  9 2011-01-09   822
#> 10 2011-01-10  1321
#> # … with 721 more rows

bike_splits <- initial_time_split(bike_transactions_tbl, prop = 0.9)
data_train  <- training(bike_splits)
data_test   <- testing(bike_splits)

resampling_strategy <- 
  data_train %>%
  time_series_cv(
    initial = "6 months",
    assess = "3 months",
    skip = "3 months",
    cumulative = TRUE
  )
#> Using date_var: date

resampling_strategy %>% 
  plot_time_series_cv_plan(date, value, 
                           .facet_ncol = 2,
                           .line_alpha = 0.5,
                           .interactive = FALSE)

tune_nnetar_model <-
  nnetar_reg(
    non_seasonal_ar = tune(),
    epochs = tune()
  ) %>%
  set_engine("nnetar", 
             scale.inputs = FALSE) %>%
  set_mode("regression")

nn_grid <- grid_regular(
  non_seasonal_ar(range = c(1L, 5L)),
  epochs(),
  levels = 2
)

simple_rec <- recipe(value ~ date, data = data_train)

nnetar_workflow <- 
  workflow() %>% 
  add_model(tune_nnetar_model) %>% 
  add_recipe(simple_rec)

nnetar_workflow
#> ══ Workflow ════════════════════════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: nnetar_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────────────────────────
#> Neural Network Auto Regression (NNETAR) Model Specification (regression)
#> 
#> Main Arguments:
#>   non_seasonal_ar = tune()
#>   epochs = tune()
#> 
#> Engine-Specific Arguments:
#>   scale.inputs = FALSE
#> 
#> Computational engine: nnetar

doParallel::registerDoParallel()

nnetar_workflow %>% 
  tune_grid(
    resamples = resampling_strategy,
    grid = nn_grid,
    metrics = metric_set(rmse, mae))
#> 
#> Attaching package: 'forecast'
#> The following object is masked from 'package:yardstick':
#> 
#>     accuracy
#> # Tuning results
#> # NA 
#> # A tibble: 5 x 4
#>   splits           id     .metrics         .notes          
#>   <list>           <chr>  <list>           <list>          
#> 1 <split [567/90]> Slice1 <tibble [8 × 6]> <tibble [0 × 1]>
#> 2 <split [477/90]> Slice2 <tibble [8 × 6]> <tibble [0 × 1]>
#> 3 <split [387/90]> Slice3 <tibble [8 × 6]> <tibble [0 × 1]>
#> 4 <split [297/90]> Slice4 <tibble [8 × 6]> <tibble [0 × 1]>
#> 5 <split [207/90]> Slice5 <tibble [8 × 6]> <tibble [0 × 1]>

^{Created on 2020-09-03 by the reprex package (v0.3.0.9001)}

Want to see if you can get this smaller example to work? Also, do you want to check out your resampling strategy and see if that is what you want to do (perhaps with plot_time_series_cv_plan())? It looked somewhat strange to me.

MxNl · 2020-09-04T08:51:20Z

Yes, I do get the same error when running your example. You are right I should have checked the resampling strategy, but as your reprex results in the same error message, I think it is not the cause of the error. As soon as my work station is not busy anymore I will add my session info. Thanks for helping out!

juliasilge · 2020-09-04T14:51:03Z

We recently made some changes to how tune handles packages in psock clusters; want to try installing the current development version of tune and see if that solves the problem?

devtools::install_github("tidymodels/tune")

MxNl · 2020-09-05T15:46:59Z

Thanks a lot, that helped! Now it runs perfectly!

github-actions · 2021-03-06T00:10:51Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

MxNl closed this as completed Sep 5, 2020

github-actions bot locked and limited conversation to collaborators Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when using parallel processing for `tune_grid()` and `nnetar_reg()` from `modeltime` package #272

Error when using parallel processing for `tune_grid()` and `nnetar_reg()` from `modeltime` package #272

MxNl commented Sep 3, 2020

juliasilge commented Sep 4, 2020

MxNl commented Sep 4, 2020

juliasilge commented Sep 4, 2020

MxNl commented Sep 5, 2020

github-actions bot commented Mar 6, 2021

Error when using parallel processing for tune_grid() and nnetar_reg() from modeltime package #272

Error when using parallel processing for tune_grid() and nnetar_reg() from modeltime package #272

Comments

MxNl commented Sep 3, 2020

The problem

Reproducable example

juliasilge commented Sep 4, 2020

MxNl commented Sep 4, 2020

juliasilge commented Sep 4, 2020

MxNl commented Sep 5, 2020

github-actions bot commented Mar 6, 2021

Error when using parallel processing for `tune_grid()` and `nnetar_reg()` from `modeltime` package #272

Error when using parallel processing for `tune_grid()` and `nnetar_reg()` from `modeltime` package #272