Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in yardstick_table(): ! truth and estimate must have the same levels in the same order. #327

Closed
deschen1 opened this issue Oct 12, 2022 · 5 comments · Fixed by #362
Labels
upkeep maintenance, infrastructure, and similar

Comments

@deschen1
Copy link

deschen1 commented Oct 12, 2022

I'm doing as I'm told by the yardstick function and posting this issue here.

When using the iris data set, it seems yardstick gets confused because in my code I turn the predicted class into a factor of 1, 2, 3 rather tahn the original character factor (with levels "setosa"...).

So the source of error is clear, but I'm posting it here nonetheless because the function asked me to. So maybe you either want to allow conf_mat to work with non-identical factor levels or maybe need to make this "posting request" a bit more tight so that it doesn't pop up for such an issue.

iris

model_recipe <- recipes::recipe(Species ~ ., data = iris)

# Create a workflow
model_final <- parsnip::naive_Bayes(Laplace = 1) |>
  parsnip::set_mode("classification") |>
  parsnip::set_engine("klaR",
                      prior = rep(1/3, 3),
                      usekernel = FALSE)

model_final_wf <- workflows::workflow() |>
  workflows::add_recipe(model_recipe) |>
  workflows::add_model(model_final)

train_fit <- model_final_wf |>
  generics::fit(data = iris)

# Add predictions, in the mutate for class_pred the new [1, 2, 3] factor is introduced
train_predictions <- predict(train_fit, iris, type = "prob") |> 
  dplyr::mutate(class_pred = as.factor(apply(dplyr::across(tidyselect::everything()), 1, which.max))) |> 
  dplyr::bind_cols(iris)

# See the problem
head(train_predictions)
# A tibble: 6 × 9
  .pred_setosa .pred_versicolor .pred_virginica class_pred Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>            <dbl>           <dbl> <fct>             <dbl>       <dbl>        <dbl>       <dbl> <fct>  
1         1            2.98e-18        2.15e-25 1                   5.1         3.5          1.4         0.2 setosa 
2         1            3.17e-17        6.94e-25 1                   4.9         3            1.4         0.2 setosa 
3         1            2.37e-18        7.24e-26 1                   4.7         3.2          1.3         0.2 setosa 
4         1            3.07e-17        8.69e-25 1                   4.6         3.1          1.5         0.2 setosa 
5         1            1.02e-18        8.89e-26 1                   5           3.6          1.4         0.2 setosa 
6         1.00         2.72e-14        4.34e-21 1                   5.4         3.9          1.7         0.4 setosa 

# Calculating confusion matrix
train_predictions |>
  yardstick::conf_mat(truth    = .data$Species,
                      estimate = .data$class_pred)

This gives error:

Error in `yardstick_table()`:
! `truth` and `estimate` must have the same levels in the same order.This is an internal error that was detected in the yardstick package.
  Please report it at <https://github.com/tidymodels/yardstick/issues> with a reprex and the full backtrace.
Run `rlang::last_error()` to see where the error occurred.

And here's the full traceback:

<error/rlang_error>
Error in `yardstick_table()`:
! `truth` and `estimate` must have the same levels in the same order.This is an internal error that was detected in the yardstick package.
  Please report it at <https://github.com/tidymodels/yardstick/issues> with a reprex and the full backtrace.
---
Backtrace:1. ├─yardstick::conf_mat(...)
 2. └─yardstick:::conf_mat.data.frame(...)
 3.   └─yardstick:::yardstick_table(truth = truth, estimate = estimate, case_weights = case_weights)
 4.     └─rlang::abort(...)
@DavisVaughan
Copy link
Member

This is a real user error (not developer error) so it is possible we should un-mark this as an .internal error

@EmilHvitfeldt EmilHvitfeldt added the upkeep maintenance, infrastructure, and similar label Oct 25, 2022
@lucasquemelli
Copy link

lucasquemelli commented Jan 22, 2023

I have the same error when I am trying to run these commands below and I put truth and also estimate as factor:

### Predictions of the model
predict_estimates <- function(model_keras, x_test_tbl, y_test_vec){
  # Predicted Class
  yhat_keras_class_vec <- predict(object = model_keras, x = as.matrix(x_test_tbl)) %>%
  as.vector()

  # Predicted Class Probability
  yhat_keras_prob_vec  <- predict(object = model_keras, x = as.matrix(x_test_tbl)) %>% `>`(0.5) %>% k_cast("int32") %>%
  as.vector()
  
  # Format test data and predictions for yardstick metrics
  estimates_keras_tbl <- tibble(
    truth      = as.factor(y_test_vec) #%>% fct_recode(yes = "1", no = "0"),
    estimate   = as.factor(yhat_keras_class_vec) #%>% fct_recode(yes = "1", no = "0"),
    class_prob = yhat_keras_prob_vec 
  )

  estimates_keras_tbl
}
### Predictions from model on test data
estimates_keras_tbl <- predict_estimates(model_keras, x_test_tbl, y_test_vec)
### Judging model benchmark metrics
# Confusion Table
estimates_keras_tbl %>% conf_mat(truth, estimate)

The error:

Error in `yardstick_table()`:
! `truth` and `estimate` must have the same levels in the same order.
ℹ This is an internal error that was detected in the yardstick package.
  Please report it at <https://github.com/tidymodels/yardstick/issues> with a reprex and the full backtrace.
Backtrace:
 1. estimates_keras_tbl %>% conf_mat(truth, estimate)
 3. yardstick:::conf_mat.data.frame(., truth, estimate)
 4. yardstick:::yardstick_table(truth = truth, estimate = estimate, case_weights = case_weights)

image

For truth and estimate, I tried to use the classes instead of binaries, such as:

...
truth      = as.factor(y_test_vec),
estimate   = as.factor(yhat_keras_class_vec),
...

Yet, it has not worked anyway. Could anyone tell me what I should do?

@EmilHvitfeldt
Copy link
Member

Hello @lucasquemelli ! Sorry to hear you are also running into problems. Without being able to see the data, I would recomend that you use factor() with levels = set explicitly for safety.

truth      = factor(y_test_vec, levels = c(0, 1))
estimate   = factor(yhat_keras_class_vec, levels = c(0, 1))

If that doesn't fix it, can you show be what happens if you run

str(estimates_keras_tbl$estimate)
str(estimates_keras_tbl$truth)

@lucasquemelli
Copy link

Perfection! That solved my problem. You are a genius! Thank you so much!

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
upkeep maintenance, infrastructure, and similar
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants