Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add switch to make second level the reference in step_bin2factor #142

Merged
merged 1 commit into from Apr 17, 2018

Conversation

@michaellevy
Copy link
Contributor

@michaellevy michaellevy commented Apr 6, 2018

First stab at addressing #141. I wrote the test differently than the others in the file because table orders by factor levels and I wanted to be explicit about what's being tested here.

@michaellevy
Copy link
Contributor Author

@michaellevy michaellevy commented Apr 6, 2018

Suggest making the default of the new argument FALSE. Binomial glms, e.g., assume the first level is the negative class, which leads to this backwardness with the current default.

suppressPackageStartupMessages({
  library(tidyverse)
  library(recipes)
})
d <- 
  tibble(y = rbinom(50, 1, .5),
         x1 = rnorm(50, mean = y),
         x2 = rnorm(50, mean = y))
d <- 
  recipe(d) %>% 
  step_bin2factor(y) %>% 
  prep(d) %>% 
  bake(d)
m <- glm(y ~ ., d, family = "binomial")
d$predicted_prob <- predict(m, d, type = "response")
ggplot(d, aes(x = y, y = predicted_prob)) + 
  geom_boxplot()

Created on 2018-04-06 by the reprex package (v0.2.0).

@topepo
Copy link
Collaborator

@topepo topepo commented Apr 7, 2018

Suggest making the default of the new argument FALSE.

No, I'm sticking to my guns on this one. Sorry about the rant.

This convention is old and misguided since it is based on encoding binary categorical data as 0/1. If people forgot about this and were asked "which factor level is generally more important?," nobody would say "the last one".

So in my quixotic efforts to change established human behavior, I'm going to say that we should treat categorical data as qualitative and forget about how it is eventually encoded (i.e., please make the default TRUE).

</rant>

@michaellevy
Copy link
Contributor Author

@michaellevy michaellevy commented Apr 11, 2018

Quixotic ranting is good. I'll save you my counter-rant because I see that this is baked into caret and so would be a massive change. The default is TRUE.

@topepo
Copy link
Collaborator

@topepo topepo commented Apr 11, 2018

👍

Can you add a unit test for the option and make a note in the NEWS.md file under "Other Changes"? Feel free to link to the PR or issue and use "(contributed by...)" or similar.

@topepo topepo merged commit 73e8447 into tidymodels:master Apr 17, 2018
3 checks passed
3 checks passed
codecov/patch 100% of diff hit (target 92.64%)
Details
codecov/project 92.64% (+<.01%) compared to a95175e
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
topepo added a commit that referenced this pull request Apr 17, 2018
@michaellevy
Copy link
Contributor Author

@michaellevy michaellevy commented Apr 18, 2018

Sorry, I missed that last ask. Thanks for letting me add this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.