New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in cbind2(1, newx) %*% nbeta : invalid class 'NA' to dup_mMatrix_as_dgeMatrix #200
Comments
|
Honestly, I have not idea. I rewrote the prediction helper function to be a little more simple and rearranged the arguments (odc :-/). I also added a performance metric below too. We're working on model tuning right now that will make this a lot easier. The use of set.seed(42)
# Loading libraries -------------------------------------------------------
library(magrittr)
library(tidyverse)
#> Registered S3 method overwritten by 'rvest':
#> method from
#> read_xml.response xml2
library(tidymodels)
#> ── Attaching packages ──────────────────────────────────────────────────────── tidymodels 0.0.2 ──
#> ✔ broom 0.5.2 ✔ recipes 0.1.6
#> ✔ dials 0.0.2 ✔ rsample 0.0.5
#> ✔ infer 0.4.0.1 ✔ yardstick 0.0.3
#> ✔ parsnip 0.0.3
#> ── Conflicts ─────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ scales::discard() masks purrr::discard()
#> ✖ tidyr::extract() masks magrittr::extract()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ recipes::fixed() masks stringr::fixed()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ purrr::set_names() masks magrittr::set_names()
#> ✖ yardstick::spec() masks readr::spec()
#> ✖ recipes::step() masks stats::step()
library(dials)
library(furrr)
#> Loading required package: future
# Loading input dataset ---------------------------------------------------
df_all <- iris %>%
filter(Species != "setosa") %>%
mutate(Species = factor(Species, levels = c("versicolor", "virginica")))
# Dividing the dataset ----------------------------------------------------
df_train_cv <- vfold_cv(df_all, v = 5, repeats = 1)
# Preparing the recipes ----------------------------------------------------
# I need to add a custom step over here on the missing patterns
en_rec <- df_all %>%
recipe(Species ~ .) %>%
step_pca(all_predictors(), num_comp = 2)
# Training models withing resamples ---------------------------------------
fit_on_fold <- function(spec, prepped) {
x <- juice(prepped, all_predictors())
y <- juice(prepped, all_outcomes())
fit_xy(spec, x, y)
}
en_engine <- logistic_reg(mode = "classification") %>%
set_engine("glmnet")
en_grid <- grid_regular(penalty, mixture, levels = c(2, 2))
en_spec <- tibble(spec = merge(en_engine, en_grid)) %>% # combining model engine with different parameters
mutate(model_id = row_number())
en_spec_cv <- crossing(df_train_cv, en_spec) # adding cross-validated folds
en_fits_cv <- en_spec_cv %>% # fitting different model specifications to different folds
mutate(
prepped = future_map(splits, prepper, en_rec),
fit = future_map2(spec, prepped, fit_on_fold)
)
predict_helper <- function(split, recipe, fit) {
new_x <- bake(recipe, new_data = assessment(split), all_predictors())
predict(fit, new_x, type = "prob") %>%
bind_cols(assessment(split) %>% select(Species))
}
en_fits_cv_pred <- en_fits_cv %>%
mutate(
preds = future_pmap(list(splits, prepped, fit), predict_helper)
)
indiv_estimates <-
en_fits_cv_pred %>%
unnest(preds) %>%
group_by(id, model_id) %>%
# or some other performance measure:
mn_log_loss(truth = Species, .pred_virginica)
rs_estimates <-
indiv_estimates %>%
group_by(model_id, .metric, .estimator) %>%
summarize(mean = mean(.estimate, na.rm = TRUE))
rs_estimates
#> # A tibble: 4 x 4
#> # Groups: model_id, .metric [4]
#> model_id .metric .estimator mean
#> <int> <chr> <chr> <dbl>
#> 1 1 mn_log_loss binary 2.36
#> 2 2 mn_log_loss binary 0.938
#> 3 3 mn_log_loss binary 9.45
#> 4 4 mn_log_loss binary 0.691Created on 2019-07-31 by the reprex package (v0.2.1) |
|
Thanks @topepo for taking a look! |
|
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
I took the example listed in this blogpost and tried to replicate it using
glmnet: https://www.alexpghayes.com/blog/implementing-the-super-learner-with-tidymodels/I wanted to use binary classification so I excluded one of the factor levels, but otherwise changed as minimal as possible in order to run it. When I'm getting to the part when I want to make predictions on the split's assessment set I get the following error:
More specifically it's breaking in this part when I'm trying to make predictions on the hold-out set:
I was also trying to run the prediction using only 1 model fit to exclude the possibility of something breaking in the map, but the error perists:
The full code I'm running is the following:
I've been looking for help around the internet but unfortunately I'm absolutely about where the root case could be. Could anyone assist?
My session info below:
The text was updated successfully, but these errors were encountered: