Range of penalty() in Ridge Regression does not cover lambda chosen by cv.glmnet #140

andrjohns · 2020-09-08T06:29:59Z

Hi,

I'm trying to use the tidymodels ecosystem to run a ridge regression, and so am using the grid_regular function to create the search space of lambda values to evaluate. However, these lambda values are always upper-bounded by 1, and when I run some data through cv.glmnet, the lambdas tested and chosen are greater than one:

library(tidymodels)
#> -- Attaching packages ----------------------------------------------------------------- tidymodels 0.1.1 --
#> v broom     0.7.0      v recipes   0.1.13
#> v dials     0.0.8      v rsample   0.0.7 
#> v dplyr     1.0.2      v tibble    3.0.3 
#> v ggplot2   3.3.2      v tidyr     1.1.2 
#> v infer     0.5.3      v tune      0.1.1 
#> v modeldata 0.0.2      v workflows 0.1.3 
#> v parsnip   0.1.3      v yardstick 0.0.7 
#> v purrr     0.3.4
#> -- Conflicts -------------------------------------------------------------------- tidymodels_conflicts() --
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
library(glmnet)
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#> 
#>     expand, pack, unpack
#> Loaded glmnet 4.0-2

dat = cbind(structure(rnorm(200*16),dim=c(200,16)),
            structure(sample(0:1,200*14,replace = T),dim=c(200,14))) %>%
    as.matrix()

mod = linear_reg(mode="regression",penalty=tune(),mixture=0) %>%
    set_engine("glmnet")

range(grid_regular(parameters(mod),levels=10))
#> [1] 1e-10 1e+00

fit1 = cv.glmnet(dat[,2:30],dat[,1],alpha=0)
range(fit1$lambda)
#> [1]   0.01971351 197.13506375
fit1$lambda.min
#> [1] 13.27537

^{Created on 2020-09-08 by the reprex package (v0.3.0)}

Should I be using this functionality differently for ridge regression?

The text was updated successfully, but these errors were encountered:

topepo · 2020-09-08T15:29:15Z

The range of lambda is data-driven and is affected by the choice of alpha. For that reason (and a few others), we use a default range of:

> penalty()
Amount of Regularization (quantitative)
Transformer:  log-10 
Range (transformed scale): [-10, 0]

which works well in 99% of the cases.

You can change the range if you need to go higher:

library(tidymodels)
#> ── Attaching packages ────────────────────────────────────────── tidymodels 0.1.1 ──
#> ✓ broom     0.7.0          ✓ recipes   0.1.13    
#> ✓ dials     0.0.8.9000     ✓ rsample   0.0.7     
#> ✓ dplyr     1.0.2          ✓ tibble    3.0.3     
#> ✓ ggplot2   3.3.2          ✓ tidyr     1.1.2     
#> ✓ infer     0.5.2          ✓ tune      0.1.1.9000
#> ✓ modeldata 0.0.2          ✓ workflows 0.1.3     
#> ✓ parsnip   0.1.3.9000     ✓ yardstick 0.0.7     
#> ✓ purrr     0.3.4
#> ── Conflicts ───────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()

mod  <- 
  linear_reg(mode = "regression", penalty = tune(), mixture = 0) %>%
  set_engine("glmnet")

mod %>% 
  parameters() %>% 
  update(penalty = penalty(log10(c(0.02, 198)))) %>% 
  grid_regular(levels = 10) %>% 
  summary()
#>     penalty        
#>  Min.   :  0.0200  
#>  1st Qu.:  0.2232  
#>  Median :  2.2556  
#>  Mean   : 30.9259  
#>  3rd Qu.: 21.5277  
#>  Max.   :198.0000

^{Created on 2020-09-08 by the reprex package (v0.3.0)}

(this also works with the CRAN versions of these packages; the versions above are what I have currently installed)

andrjohns · 2020-09-09T00:57:48Z

Excellent, thanks for the help Max!

github-actions · 2021-03-06T00:55:21Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

andrjohns closed this as completed Sep 9, 2020

github-actions bot locked and limited conversation to collaborators Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Range of penalty() in Ridge Regression does not cover lambda chosen by cv.glmnet #140

Range of penalty() in Ridge Regression does not cover lambda chosen by cv.glmnet #140

andrjohns commented Sep 8, 2020 •

edited

topepo commented Sep 8, 2020

andrjohns commented Sep 9, 2020

github-actions bot commented Mar 6, 2021

Range of penalty() in Ridge Regression does not cover lambda chosen by cv.glmnet #140

Range of penalty() in Ridge Regression does not cover lambda chosen by cv.glmnet #140

Comments

andrjohns commented Sep 8, 2020 • edited

topepo commented Sep 8, 2020

andrjohns commented Sep 9, 2020

github-actions bot commented Mar 6, 2021

andrjohns commented Sep 8, 2020 •

edited