Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure all encodings are done correctly #11

Closed
EmilHvitfeldt opened this issue Aug 23, 2020 · 3 comments · Fixed by #291
Closed

make sure all encodings are done correctly #11

EmilHvitfeldt opened this issue Aug 23, 2020 · 3 comments · Fixed by #291
Labels
feature a feature request or enhancement

Comments

@EmilHvitfeldt
Copy link
Member

EmilHvitfeldt commented Aug 23, 2020

encodings -> parsnip encodings, set by parsnip::set_encodings()

@EmilHvitfeldt
Copy link
Member Author

Wait for #5, #6, #7, #8

@hfrick hfrick added the feature a feature request or enhancement label Apr 16, 2021
@hfrick
Copy link
Member

hfrick commented Jan 10, 2024

All engines have a formula interface (at least we are telling parsnip that about glmnet), and most have the unsurprising encodings of predictor_indicators = "none", include_intercept = FALSE, and remove_intercept = FALSE, leaving the indicators and the intercept to the engine -- which is sensible for an engine with a formula interface.
The exceptions are bag_tree(engine = "rpart") and glmnet. The glmnet encodings for predictors and intercept are correct, the ones for the bagged tree should switch to predictor_indicators = "none".

In terms of sparsity:

  • only glmnet allows that which is generally correct but might not be true for this case here, see glmnet and sparse matrices #276
  • the mboost package includes support for sparse matrices but not for mboost::blackboost() which is what we are using
  • the rest do not, as far as I can tell
library(censored)
#> Loading required package: parsnip
#> Loading required package: survival
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

mod_names <- get_from_env("models")

model_interface <-
  purrr::map_dfr(mod_names, ~ get_from_env(paste0(.x, "_fit")) %>%
                   mutate(model = .x)) %>% 
  mutate(interface = map_chr(value, 1)) %>% 
  select(engine, mode, model, interface)

model_encodings <-
  purrr::map_dfr(mod_names, ~ get_from_env(paste0(.x, "_encoding"))) %>% 
  #left_join(model_interface, by = join_by(model, engine, mode)) %>% 
  filter(mode == "censored regression") 

model_encodings %>% 
  #group_by(interface) %>%
  count(predictor_indicators, compute_intercept, remove_intercept, allow_sparse_x)
#> # A tibble: 3 × 5
#>   predictor_indicators compute_intercept remove_intercept allow_sparse_x     n
#>   <chr>                <lgl>             <lgl>            <lgl>          <int>
#> 1 none                 FALSE             FALSE            FALSE              9
#> 2 traditional          FALSE             FALSE            FALSE              1
#> 3 traditional          TRUE              TRUE             TRUE               1

model_encodings %>% 
  filter(predictor_indicators == "traditional")
#> # A tibble: 2 × 7
#>   model     engine mode  predictor_indicators compute_intercept remove_intercept
#>   <chr>     <chr>  <chr> <chr>                <lgl>             <lgl>           
#> 1 bag_tree  rpart  cens… traditional          FALSE             FALSE           
#> 2 proporti… glmnet cens… traditional          TRUE              TRUE            
#> # ℹ 1 more variable: allow_sparse_x <lgl>

model_encodings %>% 
  filter(allow_sparse_x)
#> # A tibble: 1 × 7
#>   model     engine mode  predictor_indicators compute_intercept remove_intercept
#>   <chr>     <chr>  <chr> <chr>                <lgl>             <lgl>           
#> 1 proporti… glmnet cens… traditional          TRUE              TRUE            
#> # ℹ 1 more variable: allow_sparse_x <lgl>

Created on 2024-01-10 with reprex v2.0.2

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants