Skip to content

new tidy method for glmnet has two intercept rows #349

@juliasilge

Description

@juliasilge

I was planning on answering this person's question on Stack Overflow, but I noticed something kind of weird is going on with intercepts and parsnip vs. workflows.

  • When the fitted model is pulled from the workflow and then tidied there are two 2️⃣ rows for (Intercept)
  • When the parsnip model is fit directly, there is one row for (Intercept), as expected
library(tidymodels)
#> ── Attaching packages ─────────────────────────────────────────────────── tidymodels 0.1.1 ──
#> ✓ broom     0.7.0          ✓ recipes   0.1.13    
#> ✓ dials     0.0.8          ✓ rsample   0.0.7     
#> ✓ dplyr     1.0.0          ✓ tibble    3.0.3     
#> ✓ ggplot2   3.3.2          ✓ tidyr     1.1.0     
#> ✓ infer     0.5.3          ✓ tune      0.1.1     
#> ✓ modeldata 0.0.2          ✓ workflows 0.1.2     
#> ✓ parsnip   0.1.2.9000     ✓ yardstick 0.0.7     
#> ✓ purrr     0.3.4
#> ── Conflicts ────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()

set.seed(1234)
train <- tibble(y = factor(sample(c(0,1), 1000, replace = TRUE)),
                x1 = as.numeric(y) + rnorm(1000),
                x2 = rnorm(1000),
                x3 = rnorm(1000)
)

lr_mod <- logistic_reg(penalty = 0.03, mixture = 1) %>%
  set_mode("classification") %>%
  set_engine("glmnet")

lr_wf <- workflow() %>%
  add_model(lr_mod) %>%
  add_formula(y ~ .)

## there are *two* rows for intercept
lr_wf %>% 
  fit(train) %>%
  pull_workflow_fit() %>%
  tidy()
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#> 
#>     expand, pack, unpack
#> Loaded glmnet 4.0-2
#> # A tibble: 5 x 3
#>   term        estimate penalty
#>   <chr>          <dbl>   <dbl>
#> 1 (Intercept)   -1.31     0.03
#> 2 (Intercept)    0        0.03
#> 3 x1             0.843    0.03
#> 4 x2             0        0.03
#> 5 x3             0        0.03

## no second intercept row
lr_mod %>%
  fit(y ~ ., data = train) %>%
  tidy()
#> # A tibble: 4 x 3
#>   term        estimate penalty
#>   <chr>          <dbl>   <dbl>
#> 1 (Intercept)   -1.31     0.03
#> 2 x1             0.843    0.03
#> 3 x2             0        0.03
#> 4 x3             0        0.03

Created on 2020-07-25 by the reprex package (v0.3.0.9001)

I haven't dug into this yet; it might belong on workflows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behaviornext release 🚀

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions