You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to get survival probabilities for glmnet objects that use family = "cox". This page has details on the Cox model with glmnet and describes the special approach that is required for this model implementation.
There are a few complications that arise when building a wrapper around this model:
No formulas
glmnet() does not use a formula method so something simple like Surv(time, event) ~ x_1 + x_2 isn't possible. In their examples, they has the contents of the Surv() object as a matrix (or an object of class Surv) to the glmnet() function. I believe that the censored package already deals with this.
The main consequence of no formula method is that, for stratification, we cannot use the canonical function (e.g., Surv(time, event) ~ x_1 + x_2 + strata(var)). There is a different function that is used on the outcome object to store the stratification variable. The syntax looks like stratifySurv(surv_object, strata_var).
Retain the training set
Like the survival package, survival probabilities for the Cox model are best computed by using the survfit() method on the model object. While glmnetdoes have a survfit() method for their data, it requires the original training set data to work.
We'll have to attach x and y data to the fitted glmnet object when the model is fit.
Predictions over penalty values
Like other glmnet objects, we can make predictions over many values of lambda for the same model object. For survival probabilities, this is also the case. When making such predictions, we also need to specify the time points. As a result, the standard nested tibble that censored produces will have a row for each combination of .time and lambda and would look something like:
The initial work here is to make a function similar to censored::cph_survival_prob() to get the data in this format. It looks like survival:::survfit.coxph() produces a list of survfit objects for each value of lambda. It may not bee too complex to get these predictions and then reformat them (which we have code to do for the results of survival:::survfit.coxph()).
Note: recall that, when a glmnet model is fit via parsnip, we require a single penalty value (even though the model produces all of the coefficients for the entire path of penalty values). For this reason, predict.model_fit() will only produce predictions for a single penalty value. This function should produce the above output without the penalty column. The multi_predict() method will have an argument for the penalty values and its results will look like the tibble above.
The text was updated successfully, but these errors were encountered:
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
We want to get survival probabilities for
glmnet
objects that usefamily = "cox"
. This page has details on the Cox model withglmnet
and describes the special approach that is required for this model implementation.There are a few complications that arise when building a wrapper around this model:
No formulas
glmnet()
does not use a formula method so something simple likeSurv(time, event) ~ x_1 + x_2
isn't possible. In their examples, they has the contents of theSurv()
object as a matrix (or an object of classSurv
) to theglmnet()
function. I believe that thecensored
package already deals with this.The main consequence of no formula method is that, for stratification, we cannot use the canonical function (e.g.,
Surv(time, event) ~ x_1 + x_2 + strata(var)
). There is a different function that is used on the outcome object to store the stratification variable. The syntax looks likestratifySurv(surv_object, strata_var)
.Retain the training set
Like the
survival
package, survival probabilities for the Cox model are best computed by using thesurvfit()
method on the model object. Whileglmnet
does have asurvfit()
method for their data, it requires the original training set data to work.We'll have to attach
x
andy
data to the fittedglmnet
object when the model is fit.Predictions over penalty values
Like other
glmnet
objects, we can make predictions over many values oflambda
for the same model object. For survival probabilities, this is also the case. When making such predictions, we also need to specify the time points. As a result, the standard nested tibble thatcensored
produces will have a row for each combination of.time
andlambda
and would look something like:The initial work here is to make a function similar to
censored::cph_survival_prob()
to get the data in this format. It looks likesurvival:::survfit.coxph()
produces a list ofsurvfit
objects for each value oflambda
. It may not bee too complex to get these predictions and then reformat them (which we have code to do for the results ofsurvival:::survfit.coxph()
).Note: recall that, when a
glmnet
model is fit viaparsnip
, we require a single penalty value (even though the model produces all of the coefficients for the entire path of penalty values). For this reason,predict.model_fit()
will only produce predictions for a single penalty value. This function should produce the above output without the penalty column. Themulti_predict()
method will have an argument for the penalty values and its results will look like the tibble above.The text was updated successfully, but these errors were encountered: