Skip to content

Ranger not working with sparse representation #694

@tolliam

Description

@tolliam

Ranger models fit without any problems when using a sparse representation but produce an error about converting matrices to data frames when generating predictions. I guess this relates to when the predictions are being converted back from ranger's output? With issue 691 I think this means only glmnet is the only engine supported for sparse matrices.

library(tidyverse)
library(tidymodels)
#> Warning: package 'dials' was built under R version 4.1.2
#> Warning: package 'recipes' was built under R version 4.1.2
#> Warning: package 'workflowsets' was built under R version 4.1.2

data("small_fine_foods")
training_data
#> # A tibble: 4,000 × 3
#>    product    review                                                       score
#>    <chr>      <chr>                                                        <fct>
#>  1 B000J0LSBG "this stuff is  not stuffing  its  not good at all  save yo… other
#>  2 B000EYLDYE "I absolutely LOVE this dried fruit.  LOVE IT.  Whenever I … great
#>  3 B0026LIO9A "GREAT DEAL, CONVENIENT TOO.  Much cheaper than WalMart and… great
#>  4 B00473P8SK "Great flavor, we go through a ton of this sauce! I discove… great
#>  5 B001SAWTNM "This is excellent salsa/hot sauce, but you can get it for … great
#>  6 B000FAG90U "Again, this is the best dogfood out there.  One suggestion… great
#>  7 B006BXTCEK "The box I received was filled with teas, hot chocolates, a… other
#>  8 B002GWH5OY "This is delicious coffee which compares favorably with muc… great
#>  9 B003R0MFYY "Don't let these little tiny cans fool you.  They pack a lo… great
#> 10 B001EO5ZXI "One of the nicest, smoothest cup of chai I've made. Nice m… great
#> # … with 3,990 more rows

library(hardhat)
#> Warning: package 'hardhat' was built under R version 4.1.2
sparse_bp <- default_recipe_blueprint(composition = "dgCMatrix")

library(textrecipes)

text_rec <-
  recipe(score ~ review, data = training_data) %>%
  step_tokenize(review)  %>%
  step_tokenfilter(review, max_tokens = 1e3) %>%
  step_tfidf(review)

ranger_spec <-
  rand_forest(mode = "classification") %>%
  set_engine("ranger")

wf_sparse <- 
  workflow() %>%
  add_recipe(text_rec, blueprint = sparse_bp) %>%
  add_model(ranger_spec)

wf_default <- 
  workflow() %>%
  add_recipe(text_rec) %>%
  add_model(ranger_spec)

set.seed(123)

wf_default %>% 
  fit(training_data) %>% 
  predict(testing_data)
#> # A tibble: 1,000 × 1
#>    .pred_class
#>    <fct>      
#>  1 great      
#>  2 great      
#>  3 other      
#>  4 great      
#>  5 great      
#>  6 great      
#>  7 great      
#>  8 great      
#>  9 great      
#> 10 other      
#> # … with 990 more rows

wf_sparse %>% 
  fit(training_data) %>% 
  predict(testing_data)
#> Error in as.data.frame.default(new_data): cannot coerce class 'structure("dgCMatrix", package = "Matrix")' to a data.frame

Created on 2022-03-26 by the reprex package (v2.0.1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions