Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent behavior for fit classification when the response isn't a factor #115

Closed
EmilHvitfeldt opened this issue Dec 14, 2018 · 4 comments

Comments

@EmilHvitfeldt
Copy link
Member

For whatever reason you might accidentally specify your response as something other then a factor. This gives rather informative error messages.

library(tidymodels)
#> ── Attaching packages ────────────────────────────────────────────────────────────────────── tidymodels 0.0.2 ──
#> ✔ broom     0.5.1     ✔ purrr     0.2.5
#> ✔ dials     0.0.2     ✔ recipes   0.1.4
#> ✔ dplyr     0.7.8     ✔ rsample   0.0.3
#> ✔ ggplot2   3.1.0     ✔ tibble    1.4.2
#> ✔ infer     0.4.0     ✔ yardstick 0.0.2
#> ✔ parsnip   0.0.1
#> ── Conflicts ───────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ rsample::fill()  masks tidyr::fill()
#> ✖ dplyr::filter()  masks stats::filter()
#> ✖ dplyr::lag()     masks stats::lag()
#> ✖ recipes::step()  masks stats::step()

### factor

model_logistic <- logistic_reg(mode = "classification") %>%
  set_engine("glm") %>%
  fit(Species ~ ., data = iris)
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

predict_class(model_logistic, iris)
#>   [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>   [7] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [13] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [19] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [25] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [31] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [37] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [43] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [49] setosa     setosa     versicolor versicolor versicolor versicolor
#>  [55] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [61] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [67] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [73] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [79] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [85] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [91] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [97] versicolor versicolor versicolor versicolor versicolor versicolor
#> [103] versicolor versicolor versicolor versicolor versicolor versicolor
#> [109] versicolor versicolor versicolor versicolor versicolor versicolor
#> [115] versicolor versicolor versicolor versicolor versicolor versicolor
#> [121] versicolor versicolor versicolor versicolor versicolor versicolor
#> [127] versicolor versicolor versicolor versicolor versicolor versicolor
#> [133] versicolor versicolor versicolor versicolor versicolor versicolor
#> [139] versicolor versicolor versicolor versicolor versicolor versicolor
#> [145] versicolor versicolor versicolor versicolor versicolor versicolor
#> Levels: setosa versicolor virginica

### logical

model_logistic <- logistic_reg(mode = "classification") %>%
  set_engine("glm") %>%
  fit(Species ~ ., data = iris %>% mutate(Species = Species == "setosa"))
#> Warning: glm.fit: algorithm did not converge

#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

predict_class(model_logistic, iris)
#> Warning in rep(yes, length.out = length(ans)): 'x' is NULL so the result
#> will be NULL
#> Error in ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]: replacement has length zero

### numeric

model_logistic <- logistic_reg(mode = "classification") %>%
  set_engine("glm") %>%
  fit(Species ~ ., data = iris %>% mutate(Species = as.numeric(Species)))
#> Error in eval(family$initialize): y values must be 0 <= y <= 1

### character

model_logistic <- logistic_reg(mode = "classification") %>%
  set_engine("glm") %>%
  fit(Species ~ ., data = iris %>% mutate(Species = as.character(Species)))
#> Error in eval(family$initialize): y values must be 0 <= y <= 1

Created on 2018-12-13 by the reprex package (v0.2.1)

When response is logical we manage to fit, but predict_class fails. And when the response is numeric and character it fails to fit in both.

I propose we check inside fit.model_spec that the response is of type factor when doing classification.

@EmilHvitfeldt EmilHvitfeldt changed the title Consistent behavoir when for fit classification when the response isn't a factor Consistent behavior for fit classification when the response isn't a factor Dec 14, 2018
@EmilHvitfeldt
Copy link
Member Author

I don't mind trying to fix it if you deem it necessary.

topepo added a commit that referenced this issue Feb 27, 2019
@topepo
Copy link
Member

topepo commented Feb 28, 2019

(sorry @EmilHvitfeldt I didn't see your comment to volunteer until I started)

@EmilHvitfeldt
Copy link
Member Author

No problem, I'm just happy it is being handled!

topepo added a commit that referenced this issue Mar 3, 2019
For classification problems, an error is thrown if the outcome is not a factor: 

```
Error: For classification models, the outcome should be a factor.
```

There are a lot of travis related changes too: 

  * To get travis to run tests, the modeling packages have to be formal dependencies so a bunch were added to Suggests. This _may_ be temporary; I may decide to remove these for the version sent to CRAN. `rstanarm` was excluded because compiling it (and its dependencies) exceeded the time allowed by travis. 

 * A variety of changes were made to the tests related to r-devel. It looks like any function directly accessed in the tests now need to be formal dependencies (not the case before). There is still some weirdness about this though and I've attributed it to being _devel_. Hopefully this will get smoothed out.
@topepo topepo mentioned this issue Mar 17, 2019
topepo added a commit that referenced this issue Mar 17, 2019
* changes to #115

* testing out adding modeling pacakges to suggests

* first attempt at getting around CXX14 travis issues

* second attempt at getting around CXX14 travis issues

* third attempt at getting around CXX14 travis issues (xenial)

* fourth attempt at getting around CXX14 travis issues

* fifth attempt at getting around CXX14 travis issues

* sizth attempt at getting around CXX14 travis issues

* abandoning rstanarm on travis

* cleaned the travis house to see what runs and fails

* no rstanarm and back to trusty

* trying to avoid travis failures for devtools

```
* installing *source* package ‘devtools’ ...
** package ‘devtools’ successfully unpacked and MD5 sums checked
** R
** inst
** byte-compile and prepare package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
  there is no package called ‘magrittr’
ERROR: lazy loading failed for package ‘devtools’
* removing ‘/home/travis/R/Library/devtools’
* restoring previous ‘/home/travis/R/Library/devtools’
The downloaded source packages are in
	‘/tmp/RtmpUhu1j4/downloaded_packages’
Warning message:
In install.packages(c("devtools")) :
  installation of package ‘devtools’ had non-zero exit status
```

* back to original travis file

* reformatting

* R-devel complains about calling by namespace without loading

* seventh attempt at getting around CXX14 travis issues

* eigth attempt at getting around CXX14 travis issues

* bad backtick!

* moved makevars stuff to before_install

* adding C50 and rstanarm back into travis builds

* get binaries for stan packages

* CXX14FLAGS for rstan

* more binaries

* no stan

* no stan (I mean it this time)

* one last library call

* fixed test case

* more changes for r-devel

* removed specific top-level stan calls

* remove ancillary files

* avodi returning multiple logicals

* back to computing probability results

* added library calls

* namespace the predict function
@github-actions
Copy link

github-actions bot commented Mar 8, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants