Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using parallel processing for tune_grid() and nnetar_reg() from modeltime package #272

Closed
MxNl opened this issue Sep 3, 2020 · 5 comments

Comments

@MxNl
Copy link

MxNl commented Sep 3, 2020

The problem

I am getting the following error when trying parallelized tuning following these instructions:

x id, out_id, in_id, data: internal: Error in rlang::env_get(mod_env, items): Argument "default" fehlt (ohne Standardwert)
All models failed in tune_grid(). See the .notes column.

I have read related issues such as #159, #157, #60, and #59. But proposed solutions didn't help.

Reproducable example

import::from(janitor, clean_names, get_dupes)
library(forecast)
library(timetk)
library(modeltime)
library(tidymodels)
library(tidyverse)

data("data_buoy_gaps", package = "forecastML")

data <- 
  data_buoy_gaps


## Minor Preparations
data <- 
  data %>% 
  select(date, wind_spd, air_temperature, sea_surface_temperature)

## Modelling
train_test_splits <- data %>% 
  initial_time_split(prop = 0.9)

data_train <- 
  train_test_splits %>% 
  training()

data_test <- 
  train_test_splits %>% 
  testing()

## Resampling with `rsample`
n_samples_train <- 
  data_train %>% 
  nrow()

n_initial <- 
  (n_samples_train * 0.5) %>% 
  floor()

n_slices <- 5

n_slice <- 
  ((n_samples_train - n_initial) / n_slices) %>% 
  floor()

resampling_strategy_cv5fold <- 
  data_train %>%
  time_series_cv(
    initial = n_initial,
    assess = n_slice,
    skip = n_slice,
    cumulative = TRUE
  )

# Preprocessing with `recipe`
buoy_gaps_recipe <-
  recipe(wind_spd ~ ., data = data_train) %>% 
  # update_role(date, new_role = "ID") %>%
  step_normalize(all_predictors(), -date)

# Defining a Learner with `parsnip` and `modeltime`
tune_nnetar_model <-
  nnetar_reg(
    seasonal_period = 12,
    non_seasonal_ar = tune(),
    seasonal_ar = tune(),
    hidden_units = tune(),
    num_networks = 20,
    penalty = tune(),
    epochs = tune()
  ) %>%
  set_engine("nnetar", 
             scale.inputs = FALSE) %>%
  set_mode("regression")

# Tuning
n_levels <- 2

tune_grid <- grid_regular(
  non_seasonal_ar(range = c(1L, 5L)),
  seasonal_ar(range = c(1L, 5L)),
  hidden_units(),
  # num_networks(),
  penalty(),
  epochs(),
  levels = n_levels
)

tune_grid %>% 
  nrow()

# Workflow
nnetar_workflow <- 
  workflow() %>% 
  add_model(tune_nnetar_model) %>% 
  add_recipe(buoy_gaps_recipe)

# Parallelize Tuning
library(doParallel)
library(doFuture)
all_cores <- parallel::detectCores(logical = FALSE)

registerDoFuture()
cl <- makeCluster(all_cores)
plan(cluster, workers = cl)

nnetar_resampling <- 
  nnetar_workflow %>% 
  tune_grid(
    resamples = resampling_strategy_cv5fold,
    grid = tune_grid,
    metrics = metric_set(rmse, mae))

I would appreciate any help. Thanks a lot!

@juliasilge
Copy link
Member

I don't believe there is a problem with tuning nnetar_reg() with parallel processing. I tried this out on a simpler, smaller dataset here:

library(timetk)
library(modeltime)
library(tidymodels)

bike_transactions_tbl <- bike_sharing_daily %>%
  select(dteday, cnt) %>%
  set_names(c("date", "value"))

bike_transactions_tbl
#> # A tibble: 731 x 2
#>    date       value
#>    <date>     <dbl>
#>  1 2011-01-01   985
#>  2 2011-01-02   801
#>  3 2011-01-03  1349
#>  4 2011-01-04  1562
#>  5 2011-01-05  1600
#>  6 2011-01-06  1606
#>  7 2011-01-07  1510
#>  8 2011-01-08   959
#>  9 2011-01-09   822
#> 10 2011-01-10  1321
#> # … with 721 more rows

bike_splits <- initial_time_split(bike_transactions_tbl, prop = 0.9)
data_train  <- training(bike_splits)
data_test   <- testing(bike_splits)

resampling_strategy <- 
  data_train %>%
  time_series_cv(
    initial = "6 months",
    assess = "3 months",
    skip = "3 months",
    cumulative = TRUE
  )
#> Using date_var: date

resampling_strategy %>% 
  plot_time_series_cv_plan(date, value, 
                           .facet_ncol = 2,
                           .line_alpha = 0.5,
                           .interactive = FALSE)

tune_nnetar_model <-
  nnetar_reg(
    non_seasonal_ar = tune(),
    epochs = tune()
  ) %>%
  set_engine("nnetar", 
             scale.inputs = FALSE) %>%
  set_mode("regression")

nn_grid <- grid_regular(
  non_seasonal_ar(range = c(1L, 5L)),
  epochs(),
  levels = 2
)

simple_rec <- recipe(value ~ date, data = data_train)

nnetar_workflow <- 
  workflow() %>% 
  add_model(tune_nnetar_model) %>% 
  add_recipe(simple_rec)

nnetar_workflow
#> ══ Workflow ════════════════════════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: nnetar_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────────────────────────
#> Neural Network Auto Regression (NNETAR) Model Specification (regression)
#> 
#> Main Arguments:
#>   non_seasonal_ar = tune()
#>   epochs = tune()
#> 
#> Engine-Specific Arguments:
#>   scale.inputs = FALSE
#> 
#> Computational engine: nnetar

doParallel::registerDoParallel()

nnetar_workflow %>% 
  tune_grid(
    resamples = resampling_strategy,
    grid = nn_grid,
    metrics = metric_set(rmse, mae))
#> 
#> Attaching package: 'forecast'
#> The following object is masked from 'package:yardstick':
#> 
#>     accuracy
#> # Tuning results
#> # NA 
#> # A tibble: 5 x 4
#>   splits           id     .metrics         .notes          
#>   <list>           <chr>  <list>           <list>          
#> 1 <split [567/90]> Slice1 <tibble [8 × 6]> <tibble [0 × 1]>
#> 2 <split [477/90]> Slice2 <tibble [8 × 6]> <tibble [0 × 1]>
#> 3 <split [387/90]> Slice3 <tibble [8 × 6]> <tibble [0 × 1]>
#> 4 <split [297/90]> Slice4 <tibble [8 × 6]> <tibble [0 × 1]>
#> 5 <split [207/90]> Slice5 <tibble [8 × 6]> <tibble [0 × 1]>

Created on 2020-09-03 by the reprex package (v0.3.0.9001)

Want to see if you can get this smaller example to work? Also, do you want to check out your resampling strategy and see if that is what you want to do (perhaps with plot_time_series_cv_plan())? It looked somewhat strange to me.

@MxNl
Copy link
Author

MxNl commented Sep 4, 2020

Yes, I do get the same error when running your example. You are right I should have checked the resampling strategy, but as your reprex results in the same error message, I think it is not the cause of the error. As soon as my work station is not busy anymore I will add my session info. Thanks for helping out!

@juliasilge
Copy link
Member

We recently made some changes to how tune handles packages in psock clusters; want to try installing the current development version of tune and see if that solves the problem?

devtools::install_github("tidymodels/tune")

@MxNl
Copy link
Author

MxNl commented Sep 5, 2020

Thanks a lot, that helped! Now it runs perfectly!

@MxNl MxNl closed this as completed Sep 5, 2020
@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants