Skip to content

Conversation

@hfrick
Copy link
Member

@hfrick hfrick commented Sep 19, 2022

Part of #146

This refactors the calculation of the survival probabilities for mboost models based on their survFit objects.

This changes behavior for the edge cases of predicting before or after event times. Do you think this is the right behavior? At time Inf it currently returns the survival probability at the last event but I'm inclined to think it should always be 0. Similarly, should we make return a probability of 1 for -Inf? And if so, do we make sure this also happens for other engines?

# what we used previously:
floor_surv_mboost <- function(x, time) {
  ind <- purrr::map_int(time, ~ max(which(.x > c(-Inf, unname(x$time)))))
  t(unname(rbind(1, x$surv))[ind, ])
}

library(censored)
#> Loading required package: parsnip
#> Loading required package: survival
library(mboost)
#> Loading required package: parallel
#> Loading required package: stabs
lung2 <- lung[-14, ]
lung_pred <- lung2[1:3,]

lung2$time[1] <- -3
mboost_object <- blackboost(Surv(time, status) ~ age + ph.ecog,
                           data = lung2, family = CoxPH())
survFit_object <- survFit(mboost_object, newdata = lung_pred)


# prediction time before 1st event time ---------------------------------

# this falls back to survival prob = 1 if the first event happened after time 0

# if the first event happens before 0, this used to also fall back to prob 1
floor_surv_mboost(survFit_object, -5)
#>      [,1] [,2] [,3]
#> [1,]    1    1    1

# in the new version, this now falls back to the survival prob at that first event
# not "anchoring" at time 0 (since the first event is before that)
survival_prob_mboost(mboost_object, new_data = lung_pred, time = -5) %>%
  tidyr::unnest(cols = .pred)
#> # A tibble: 3 × 2
#>   .time .pred_survival
#>   <dbl>          <dbl>
#> 1    -5          0.996
#> 2    -5          0.995
#> 3    -5          0.998

# this is also what happens in the survival package
survfit_object <- coxph(Surv(time, status) ~ age + ph.ecog, data = lung2) %>%
  survfit(newdata = lung_pred)
summary(survfit_object, time = -5)$surv
#>              1         2        3
#> [1,] 0.9951563 0.9970903 0.997481


# infinite prediction times ----------------------------------------------

# this used to error
floor_surv_mboost(survFit_object, c(-Inf, Inf))
#> Warning in max(which(.x > c(-Inf, unname(x$time)))): no non-missing arguments to
#> max; returning -Inf
#> Error: Can't coerce element 1 from a double to a integer

# now it falls back to the survival prob at the first and last event respectively
survival_prob_mboost(mboost_object, new_data = lung_pred, time = c(-Inf, Inf)) %>%
  tidyr::unnest(cols = .pred)
#> # A tibble: 6 × 2
#>   .time .pred_survival
#>   <dbl>          <dbl>
#> 1  -Inf         0.996 
#> 2   Inf         0.0366
#> 3  -Inf         0.995 
#> 4   Inf         0.0226
#> 5  -Inf         0.998 
#> 6   Inf         0.212

# this is different from the survival package which removes the infinite values
summary(survfit_object, time = c(-Inf, Inf), extend = TRUE)$surv
#> Error in findrow(fit, times, extend): no points selected for one or more curves, consider using the extend argument

# should -Inf fall back to 1 and Inf to 0 instead?

Created on 2022-09-19 by the reprex package (v2.0.1)

tests: engine can fit and predict with missings but needs seed then for reproducibility
the PR to mboost to fix this is not yet merged
@hfrick hfrick requested a review from topepo September 19, 2022 13:49
@hfrick
Copy link
Member Author

hfrick commented Sep 21, 2022

update: I'll make this return a survival probability of 1 for time -Inf and 0 for time Inf

Copy link
Member

@topepo topepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the one trivial comment, it looks good

ret
}

survival_curve_to_prob <- function(time, event_times, survival_prob) {
Copy link
Member

@topepo topepo Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't limited to mboost, maybe put it in a more general location.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It does work more generally but it's only used by mboost. I'll keep it there for now and rethink moving it at the end of the big refactor!

@hfrick hfrick merged commit 12b9a6d into main Sep 23, 2022
@hfrick hfrick deleted the survival-prob-mboost branch September 23, 2022 10:03
@github-actions
Copy link

github-actions bot commented Oct 8, 2022

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants