Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict() on a mlp with nnet double names the output with `.pred_` #174

Closed
mouli3c3 opened this issue May 1, 2019 · 3 comments
Closed

predict() on a mlp with nnet double names the output with `.pred_` #174

mouli3c3 opened this issue May 1, 2019 · 3 comments
Labels
bug

Comments

@mouli3c3
Copy link

@mouli3c3 mouli3c3 commented May 1, 2019

This problem is similar to an already closed issue(#107) but with mlp using nnet.


library(tidymodels)
#> -- Attaching packages ------------------------------------------------- tidymodels 0.0.2 --
#> v broom     0.5.1       v purrr     0.3.2  
#> v dials     0.0.2       v recipes   0.1.5  
#> v dplyr     0.8.0.1     v rsample   0.0.4  
#> v ggplot2   3.1.0       v tibble    2.1.1  
#> v infer     0.4.0       v yardstick 0.0.3  
#> v parsnip   0.0.2
#> -- Conflicts ---------------------------------------------------- tidymodels_conflicts() --
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
data(credit_data)

set.seed(7075)
data_split <- initial_split(credit_data, strata = "Status", p = 0.75)

credit_train <- training(data_split)
credit_test  <- testing(data_split)
credit_rec <- 
  recipe(Status ~ ., data = credit_train) %>%
  step_knnimpute(Home, Job, Marital, Income, Assets, Debt) %>%
  step_dummy(all_nominal(), -Status) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors()) %>%
  prep(training = credit_train, retain = TRUE)

test_normalized <- bake(credit_rec, new_data = credit_test, all_predictors())

set.seed(57974)
nnet_fit <-set_engine(mlp("classification",hidden_units =10),"nnet") %>%
  fit(Status ~ ., data = juice(credit_rec))

glm_fit <- set_engine(logistic_reg(),"glm") %>% 
  fit(Status ~ ., data = juice(credit_rec))

#Issue with predict on nnet
glimpse(predict(nnet_fit, new_data = test_normalized, type = "prob"))
#> Observations: 1,113
#> Variables: 2
#> $ .pred_.pred_bad  <dbl> 0.5608545, 0.7023505, 0.3303682, 0.4221877, 0...
#> $ .pred_.pred_good <dbl> 0.4391455, 0.2976495, 0.6696318, 0.5778123, 0...

#Normal with predict on glm (No issue)
glimpse(predict(glm_fit, new_data = test_normalized, type = "prob"))
#> Observations: 1,113
#> Variables: 2
#> $ .pred_bad  <dbl> 0.04675355, 0.94317298, 0.24316454, 0.06970005, 0.0...
#> $ .pred_good <dbl> 0.95324645, 0.05682702, 0.75683546, 0.93029995, 0.9...
@topepo topepo added the bug label May 1, 2019
@topepo
Copy link
Collaborator

@topepo topepo commented May 1, 2019

SIDM (same issue, different model)

@patr1ckm
Copy link
Contributor

@patr1ckm patr1ckm commented Oct 29, 2019

I can confirm that this can be closed. Running

data(credit_data)
nnet_fit <-set_engine(mlp("classification",hidden_units =10),"nnet") %>%
  fit(Status ~ ., data = credit_data)

glm_fit <- set_engine(logistic_reg(),"glm") %>% 
  fit(Status ~ ., data = credit_data)

Produces:

> glimpse(predict(nnet_fit, new_data = credit_data, type = "prob"))
Observations: 4,454
Variables: 2
$ .pred_V1 <dbl> 0.3419620, 0.3419620, 0.3392285, 0.3387520, 0.4335137, 0.2995662, 0.2995662, 0.3010878, 0.4102205, 0.5224852, 0.33…
$ .pred_V2 <dbl> 0.6580380, 0.6580380, 0.6607715, 0.6612480, 0.5664863, 0.7004338, 0.7004338, 0.6989122, 0.5897795, 0.4775148, 0.66…

> glimpse(predict(glm_fit, new_data = credit_data, type = "prob"))
Observations: 4,454
Variables: 2
$ .pred_bad  <dbl> 0.24860098, 0.11323173, 0.56131606, 0.21922027, 0.14454134, 0.03888827, 0.04857814, 0.03515797, 0.23389520, 0.80…
$ .pred_good <dbl> 0.75139902, 0.88676827, 0.43868394, 0.78077973, 0.85545866, 0.96111173, 0.95142186, 0.96484203, 0.76610480, 0.19…

However, note the columns are named differently depending on the model type in this case.

@mouli3c3
Copy link
Author

@mouli3c3 mouli3c3 commented Oct 29, 2019

I see that nnet_fit$lvl and glm_fit$lvl both indicate target levels as "bad" "good". I'm not sure if it is intended behavior to see predict producing different column names for different models!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants
You can’t perform that action at this time.