Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glmnet lambdas #116

Closed
rlesca01 opened this issue Feb 6, 2015 · 8 comments
Closed

glmnet lambdas #116

rlesca01 opened this issue Feb 6, 2015 · 8 comments

Comments

@rlesca01
Copy link
Contributor

rlesca01 commented Feb 6, 2015

I noticed that when running glmnet with caret it seems to use fixed lambdas. A huge portion of the time it seems like I get 0.1 as the lambda selected. There may very well be a good reason for this but I know that using discrete lambdas is often not faster than running the full regularization path. In addition I've seen cases where I get better performance with cv.glmnet.

Natively glmnet uses this to logic to decide the minimum lambda:

ifelse(nobs<nvars,0.01,0.0001)

And currently this is the code in the caret glmnet grid function:

expand.grid(alpha = seq(0.1, 1, length = len),
            lambda = seq(.1, 3, length = 3 * len))

This is just some benchmarking I did with a built dataset.

library(caret)
library(glmnet)

data(mdrr)

# ridge specifying a lambda
system.time(
 tmp <- glmnet(data.matrix(mdrrDescr),
               mdrrClass,
               alpha=0,
               lambda=0.001,
               family="binomial"))
#    user  system elapsed 
#    0.86    0.00    0.89

# ridge with default lambdas used (100)
system.time(
 tmp <- glmnet(data.matrix(mdrrDescr),
               mdrrClass,
               alpha=0,
               family="binomial"))
#   user  system elapsed 
#    0.25    0.00    0.24

Changing alpha=1 is 7.47 sec for the full set of lambdas or 1.41 sec for lambda 0.001.

Anyway I was just wondering if there is a reason for this or if in the future we might be able to think about having glmnet in caret run with the native lambda. I know that the obvious workaround to get more glmnet like set of lambdas is to specify tuneGrid in train manually.

Thanks for the feedback,
Reynald

@zachmayer
Copy link
Collaborator

What happens if you take the default lambdas (100) and explicitly pass them them to glmnet?

One idea would be to have caret use glmnet's default lambdas in the default tuneGrid.

@rlesca01
Copy link
Contributor Author

rlesca01 commented Feb 6, 2015

If you run glmnet and then get the lambdas out and plug them back in to run it again it works fine and takes exactly the same amount of time.

The problem is that the full lambda sequence in determined in the Hastie's underlying fortran code.

I've just been using tuneGrid and guessing at a good sequence if I'm going to use glmnet with caret. Setting a higher tuneLength expands alpha as well, which isn't desirable.

Thanks

@zachmayer
Copy link
Collaborator

I wonder if we could write a function that would approximate Hastie's underlying fortran code and take the guesswork out of finding a good lambda sequence. If we can do that, I think we can put it into caret.

@topepo
Copy link
Owner

topepo commented Feb 10, 2015

The default code that you mention:

expand.grid(alpha = seq(0.1, 1, length = len),
            lambda = seq(.1, 3, length = 3 * len))

is really for the default grid. Instead of going the fortran route, we can do something along the lines of the rpart models: fit an initial model and pull off lambdas from that list.

While this isn't that computationally effective, note that glmnet calls different worker functions depending on the distribution (e.g. elnet for Gaussian, fishnet for Poisson, etc) that generate the lambdas and fits. For all we know, each of these has different math to get the lambda path.

One solution might be to have the grid function do something like this:

if(tuneLength > 5) {
   run initial glmnet and get path
   pick off tuneLength values of the lambda path
} else use default grid

If you are going to fit a granular grid of lambda values, the computational expense of the initial glmnet fit becomes less painful. I will see of there are options that can be used on the initial fit to shorten the time (e.g. setting max iterations = 1, etc).

@topepo
Copy link
Owner

topepo commented Feb 10, 2015

I just did some testing on moderately sized data sets.

First, doing an initial glmnet fit didn't cost much time compared to the overall process. This will not always be the case but it is not too bad.

Second, it fits a lot of bad models, at least for the three disparate cases that I tried. The path includes a lot of models with really large lambda values that regularize the crap out of the coefficients. That doesn't mean it will never work but I would never try values of lambda in the thousands. There is code below for one example from APM*. The others that I tried use Pfizer data.

Third, this does expose a design flaw in the package. I should have passed ... to the grid function. Since I didn't, there is a disconnect between the model that might be fitted to create the grid and the models that are created by the fit function. We could add this to the existing functions easily but it would break backward compatibility for custom models out there.

Overall, I think this would be a good idea for making the default grid. It might make sense, for small values of tuneLength (say < 5) to avoid excessively large values of lambda.

library(AppliedPredictiveModeling)

data(solubility)

set.seed(100)
indx <- createFolds(solTrainY, returnTrain = TRUE)
ctrl <- trainControl(method = "cv", index = indx)

init <- glmnet(as.matrix(solTrainXtrans), solTrainY)

enetGrid <- expand.grid(lambda = init$lambda[seq(1, length(init$lambda), by = 5)], 
                        # originally c(0, 0.01, .1),
                        alpha = seq(.05, 1, length = 20))
## now:
## > summary(init$lambda[seq(1, length(init$lambda), by = 5)])
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0001954 0.0018140 0.0166700 0.1811000 0.1511000 1.3470000

set.seed(100)
gnetTune <- train(solTrainXtrans, solTrainY,
                  method = "glmnet",
                  tuneGrid = enetGrid,
                  trControl = ctrl,
                  preProc = c("center", "scale"))
## > system.time(glmnet(as.matrix(solTrainXtrans), solTrainY))[3]/gnetTune$times$everything[3]*100
## elapsed 
## 1.506544 
## 
## the initial fit would have taken < 2% of the total fitting time

@topepo
Copy link
Owner

topepo commented Feb 10, 2015

Ug. I forgot about alpha.

I'll try this on some other examples too be here is the APM example again:

> library(plyr)
> alphas <- (1:9)/10
> 
> for(i in seq(along = alphas)) {
+   init <- glmnet(as.matrix(solTrainXtrans), solTrainY, alpha = alphas[i])
+   tmp <- data.frame(alpha = alphas[i], lambda = init$lambda)
+   out <- if(i == 1) tmp else rbind(out, tmp)
+ }
> 
> ddply(out, .(alpha), function(x) summary(x$lambda))
  alpha      Min.  1st Qu.  Median   Mean 3rd Qu.   Max.
1   0.1 0.0013470 0.013480 0.13490 1.5160  1.3480 13.470
2   0.2 0.0006735 0.006741 0.06743 0.7581  0.6741  6.735
3   0.3 0.0004490 0.004494 0.04495 0.5054  0.4494  4.490
4   0.4 0.0003368 0.003370 0.03371 0.3790  0.3370  3.368
5   0.5 0.0002694 0.002696 0.02697 0.3032  0.2696  2.694
6   0.6 0.0003575 0.003186 0.02833 0.2660  0.2525  2.245
7   0.7 0.0004879 0.003870 0.03067 0.2406  0.2430  1.924
8   0.8 0.0003890 0.003159 0.02559 0.2082  0.2078  1.684
9   0.9 0.0001497 0.001498 0.01498 0.1685  0.1498  1.497

@topepo
Copy link
Owner

topepo commented Feb 11, 2015

I'm going to check in code that uses the function below with alpha = 0.5 to split the difference. Also, I trim away the largest and smallest values to make the grid less extreme.

Test as you see fit...

function(x, y, len = NULL) {
  numLev <- if(is.character(y) | is.factor(y)) length(levels(y)) else NA
  if(!is.na(numLev)) {
    fam <- ifelse(numLev > 2, "multinomial", "binomial")
  } else fam <- "gaussian"    
  init <- glmnet(as.matrix(x), y, 
                 family = fam, 
                 nlambda = len+2, 
                 alpha = .5)
  lambda <- unique(init$lambda)
  lambda <- lambda[-c(1, length(lambda))]
  lambda <- lambda[1:min(length(lambda), len)]
  expand.grid(alpha = seq(0.1, 1, length = len),
              lambda = lambda)
}

topepo added a commit that referenced this issue Feb 11, 2015
@topepo
Copy link
Owner

topepo commented Feb 16, 2015

added to the package

@topepo topepo closed this as completed Feb 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants