Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

survival probabilities for glmnet proportional hazards model #33

Closed
topepo opened this issue Mar 15, 2021 · 3 comments
Closed

survival probabilities for glmnet proportional hazards model #33

topepo opened this issue Mar 15, 2021 · 3 comments
Labels
feature a feature request or enhancement

Comments

@topepo
Copy link
Member

topepo commented Mar 15, 2021

We want to get survival probabilities for glmnet objects that use family = "cox". This page has details on the Cox model with glmnet and describes the special approach that is required for this model implementation.

There are a few complications that arise when building a wrapper around this model:

No formulas

glmnet() does not use a formula method so something simple like Surv(time, event) ~ x_1 + x_2 isn't possible. In their examples, they has the contents of the Surv() object as a matrix (or an object of class Surv) to the glmnet() function. I believe that the censored package already deals with this.

The main consequence of no formula method is that, for stratification, we cannot use the canonical function (e.g., Surv(time, event) ~ x_1 + x_2 + strata(var)). There is a different function that is used on the outcome object to store the stratification variable. The syntax looks like stratifySurv(surv_object, strata_var).

Retain the training set

Like the survival package, survival probabilities for the Cox model are best computed by using the survfit() method on the model object. While glmnet does have a survfit() method for their data, it requires the original training set data to work.

We'll have to attach x and y data to the fitted glmnet object when the model is fit.

Predictions over penalty values

Like other glmnet objects, we can make predictions over many values of lambda for the same model object. For survival probabilities, this is also the case. When making such predictions, we also need to specify the time points. As a result, the standard nested tibble that censored produces will have a row for each combination of .time and lambda and would look something like:

# A tibble: 4 x 3
  .time .pred_survival penalty
  <dbl>          <dbl>   <dbl>
1     1          0.966    0.01
2    10          0.448    0.01
3     1          0.951    0.1 
4    10          0.421    0.1 

The initial work here is to make a function similar to censored::cph_survival_prob() to get the data in this format. It looks like survival:::survfit.coxph() produces a list of survfit objects for each value of lambda. It may not bee too complex to get these predictions and then reformat them (which we have code to do for the results of survival:::survfit.coxph()).

Note: recall that, when a glmnet model is fit via parsnip, we require a single penalty value (even though the model produces all of the coefficients for the entire path of penalty values). For this reason, predict.model_fit() will only produce predictions for a single penalty value. This function should produce the above output without the penalty column. The multi_predict() method will have an argument for the penalty values and its results will look like the tibble above.

@hfrick hfrick added the feature a feature request or enhancement label Apr 16, 2021
@hfrick
Copy link
Member

hfrick commented Apr 16, 2021

see also #29 on multi_predict()

@hfrick
Copy link
Member

hfrick commented Jul 7, 2021

Closed via #46, #61, and #70

@hfrick hfrick closed this as completed Jul 7, 2021
@github-actions
Copy link

github-actions bot commented Nov 5, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants