Skip to content

aorsf with doParallel cluster has interesting error  #85

@frankiethull

Description

@frankiethull

Hi Simon -

TLDR: I think doParallel clusters are causing an issue with aorsf engine.

I was eager to test the new aorsf random forest engine after your blog post & new bonsai release to CRAN!

My goal was to benchmark against some bagged and boosted trees for one of my projects.  When I swapped out the rand_forest and switched to aorsf, I kept getting a weird error: "parsnip could not locate an implementation for rand_forest regression model specifications using the aorsf engine." But I had updated bonsai, parsnip, even switched R version as I thought I was losing my mind. To make matters even more confusing, I setup a simple regex yesterday, but it worked. (?!). So I thought something was going on with function masking or my environment, constantly refreshing it and testing my main script & the reproducible example.

The one thing I didn't have in my regex yesterday was the pretraining setting: (cluster <- makePSOCKcluster(8); registerDoParallel(cluster)). Once I initialize this cluster in the regex, it seems there is an issue with training the aorsf model (I think?). It's the only way I was able to reproduce the error in the regex.

I am wondering if I should go about this differently, not run a cluster for aorsf?, maybe it is compatible with a different library cluster library? I initialize clusters for bagged and boosted trees so there will be one in my environment unless I ran aorsf in a different script altogether (I am currently running a Quarto code chunk for each model). Open to feedback, solution, or if I'm just losing my mind. Each time it says parsnip could not locate, you can see that it is there with show_engines().

# setup libs, data, recipe ----------------------------------------------------------

library(doParallel) # The issue arised after creating a doParallel cluster ? I think so ??
#> Warning: package 'doParallel' was built under R version 4.4.1
#> Loading required package: foreach
#> Warning: package 'foreach' was built under R version 4.4.1
#> Loading required package: iterators
#> Warning: package 'iterators' was built under R version 4.4.1
#> Loading required package: parallel

library(parsnip)
#> Warning: package 'parsnip' was built under R version 4.4.1


``` r
library(rsample)
#> Warning: package 'rsample' was built under R version 4.4.1
library(tune)
#> Warning: package 'tune' was built under R version 4.4.1
library(yardstick)
#> Warning: package 'yardstick' was built under R version 4.4.1
library(workflows)
#> Warning: package 'workflows' was built under R version 4.4.1
library(recipes)
#> Warning: package 'recipes' was built under R version 4.4.1
#> Loading required package: dplyr
#> Warning: package 'dplyr' was built under R version 4.4.1
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(dplyr)
library(bonsai)
#> Warning: package 'bonsai' was built under R version 4.4.1
#library(aorsf)
library(tidymodels)   # THE ISSUE IS AN UNDERLYING LIB mask? ??? I don't think so
#> Warning: package 'tidymodels' was built under R version 4.4.1
#> Warning: package 'broom' was built under R version 4.4.1
#> Warning: package 'dials' was built under R version 4.4.1
#> Warning: package 'scales' was built under R version 4.4.1
#> Warning: package 'ggplot2' was built under R version 4.4.1
#> Warning: package 'infer' was built under R version 4.4.1
#> Warning: package 'modeldata' was built under R version 4.4.1
#> Warning: package 'purrr' was built under R version 4.4.1
#> Warning: package 'tibble' was built under R version 4.4.1
#> Warning: package 'tidyr' was built under R version 4.4.1
#> Warning: package 'workflowsets' was built under R version 4.4.1
library(finetune)
#> Warning: package 'finetune' was built under R version 4.4.1
#library(dplyr)

training <- ChickWeight |> ungroup() |> tibble::as_tibble() |> mutate(Chick = as.numeric(Chick))

folds <- vfold_cv(data = training, v = 10)

rm_smth <- "Diet"

model_recipe <- 
   recipe(weight ~ ., training) |>
    step_rm(any_of(!!rm_smth)) |>
    step_dummy(all_nominal_predictors()) |>
    step_YeoJohnson(all_nominal_predictors()) 


# random forest spec and grid ----------------------------------------
forest_grid <-  expand.grid(
  trees = c(20, 50),
  mtry = c(2, 5, 7)
)

orf_spec <- rand_forest(
  trees = tune(),
  mtry = tune()) |>
  set_engine("aorsf") |> # issues arise for aorsf:
  set_mode("regression")


# pre training settings ---
cluster <- makePSOCKcluster(8)
registerDoParallel(cluster)

# model creation -------------------------------------------
orf_results <-
  finetune::tune_race_anova(
    workflow() |>
      add_recipe(model_recipe) |>
      add_model(orf_spec),
    resamples = folds,
    grid = forest_grid,
    control = control_race(),
    metrics = metric_set(yardstick::rmse)
  )
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Error in `test_parameters_gls()`:
#> ! There were no valid metrics for the ANOVA model.
# post training settings ---
stopCluster(cluster)
registerDoSEQ()


show_notes(.Last.tune.result)
#> unique notes:
#> ──────────────────────────────────────────────────────────────
#> Error:
#> ! parsnip could not locate an implementation for `rand_forest`
#>   regression model specifications using the `aorsf` engine.
parsnip::show_engines("rand_forest")
#> # A tibble: 10 × 2
#>    engine       mode          
#>    <chr>        <chr>         
#>  1 ranger       classification
#>  2 ranger       regression    
#>  3 randomForest classification
#>  4 randomForest regression    
#>  5 spark        classification
#>  6 spark        regression    
#>  7 partykit     regression    
#>  8 partykit     classification
#>  9 aorsf        classification
#> 10 aorsf        regression
# select_best(orf_results)

Created on 2024-06-27 with reprex v2.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions