![All-test](http://drive.google.com/uc?export=view&id=1bLQ3nhDbZrCCqy_WCxxckOne2lgVvn3l)

# 4.3 Frailty Models {.unnumbered}


Frailty models are extensions of standard survival analysis techniques, such as the Cox proportional hazards (PH) model, designed to handle unobserved heterogeneity or clustering in time-to-event data. In survival analysis, we often study the time until an event occurs (e.g., death, failure, or recurrence), but not all factors influencing the risk (hazard) may be observed or measurable. Frailty models introduce a random effect, called "frailty," to account for this unobserved variation.


### Key Concepts


- **Basic Idea**: In a standard Cox PH model, the hazard rate for an individual is $h(t | X) = h_0(t) \exp(\beta^T X)$, where $h_0(t)$ is the baseline hazard, $X$ are covariates, and $\beta$ are coefficients. Frailty models modify this to $h(t | X, u) = u \cdot h_0(t) \exp(\beta^T X)$, where $u$ is the frailty term—a non-negative random variable with mean 1 (for identifiability) and variance $\theta$ (which measures the degree of heterogeneity). Higher frailty ($u > 1$) means higher risk, and vice versa.

- **Unobserved Heterogeneity**: If important covariates are omitted, the population appears more homogeneous over time because "frailer" individuals experience the event earlier, leaving "robust" survivors. This can lead to biased estimates, attenuated hazard ratios, or apparent time-dependence in hazards.

- **Dependence and Clustering**: Frailty induces positive correlation between event times within clusters (e.g., families, hospitals) or for recurrent events in the same individual.

- **Distributions for Frailty**: Common choices include gamma (constant dependence), inverse Gaussian (intermediate dependence), positive stable (early dependence), log-normal, or compound Poisson (allows a non-susceptible subpopulation).

- **Effects**:

  - **Selection**: Over time, the average frailty among survivors decreases.
  
  - **Marginal vs. Conditional**: The population-averaged (marginal) hazard differs from the individual (conditional) hazard due to averaging over frailties.
  
  - **Cross-Ratio**: Measures how one event affects the hazard of another; constant for gamma frailty.


### Types of Frailty Models


- **Individual (Univariate) Frailty Models**: These apply to independent survival data where each individual has their own unique frailty to capture unobserved individual-specific effects. They explain deviations from proportional hazards due to omitted covariates but are challenging to identify without strong assumptions (e.g., no covariates mean the frailty distribution and baseline hazard are confounded). They're less common in practice for non-clustered data but useful for modeling heterogeneity in large populations.

- **Shared Frailty Models**: These are for clustered data (e.g., siblings, patients in the same center) or recurrent events (e.g., multiple infections in one patient). The frailty is shared within the cluster or across events for the same individual, inducing dependence. For recurrent events, the "cluster" is often the individual, so the frailty is shared across their multiple events but individual-specific relative to the population. This is the most common type for datasets like recurrent disease episodes.

Frailty models can be estimated semi-parametrically (e.g., non-parametric baseline hazard via EM algorithm or penalized likelihood) or parametrically (e.g., Weibull baseline). Testing for frailty (e.g.,  $\theta = 0$ uses a mixture chi-squared distribution. Packages in R like `survival`, `frailtypack`, `frailtyEM`, and `coxme` support fitting these models.



## Implement Frailty Models in R


We'll fit:

- A standard Cox PH model (no frailty, assuming independence).
- An individual frailty model (univariate, but note: for recurrent data, this isn't standard; we'll simulate it by treating each event as independent with per-observation frailty, though identifiability is limited).
- A shared frailty model (standard for recurrent events, with frailty shared across events per patient).

For individual frailty in recurrent data, it's conceptually tricky since events are correlated; we'll approximate it using a Gaussian random effect on a per-event basis (unique ID per row), but this is more illustrative than practical. In practice, shared frailty is preferred for this dataset.


### Install Required R Packages


Following R packages are required to run this notebook. If any of these packages are not installed, you can install them using the code below:


In [None]:
# Install rpy2
from google.colab import drive
drive.mount('/content/drive')

## Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
%%R
packages <-c(
		 'tidyverse',
		 'survival',
		 'survminer',
		 'ggsurvfit',
		 'tidycmprsk',
		 'ggfortify',
		 'timereg',
		 'cmprsk',
		 'riskRegression',
		 'reda',
		 'frailtypack',
		 'coxme'
		 )




```{r 

# Install missing packages

new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
devtools::install_github("ItziarI/WeDiBaDis")
``` 



In [None]:
%%R
# Verify installation
cat("Installed packages:\n")
print(sapply(packages, requireNamespace, quietly = TRUE))

### Load Packages

In [None]:
%%R
# Load packages with suppressed messages
invisible(lapply(packages, function(pkg) {
  suppressPackageStartupMessages(library(pkg, character.only = TRUE))
}))

In [None]:
%%R
# Check loaded packages
cat("Successfully loaded packages:\n")
print(search()[grepl("package:", search())])

### Data


This tutorial uses the `bladder1` dataset from the `survival` package in R, which contains data on recurrent bladder cancer tumors from 85 patients. It's in counting-process format for recurrent events (up to 4 recurrences per patient). Columns include:
- `id`: Patient ID (cluster for shared frailty).
- `rx`: Treatment (1 = placebo, 2 = thiotepa).
- `number`: Initial number of tumors.
- `size`: Initial tumor size (cm).
- `start`: Start time of interval.
- `stop`: End time of interval (event or censoring time).
- `event`: Indicator (1 = recurrence, 0 = censored).
- `enum`: Event number (1-4).



In [None]:
%%R
# Load bladder1 dataset
data(bladder1)

### Fit a Standard Cox PH Model (No Frailty)


This assumes independence across all observations (ignores clustering by patient).


In [None]:
%%R
# Fit model (using counting-process format for recurrent events)
cox_no_frailty <- coxph(Surv(start, stop, status) ~treatment + number + size, 
                        data = bladder1)

# Summary
summary(cox_no_frailty)
# To account for clustering (robust SEs, but no frailty)
cox_robust <- coxph(Surv(start, stop, status) ~ treatment + number + size + cluster(id), 
                    data = bladder1)
summary(cox_robust)  # Similar coefficients, but adjusted SEs

### Fit an Individual Frailty Model


For illustration, we'll create a unique ID per observation (row) and fit a Gaussian frailty (normal random effects on log-scale). This treats each event as having its own independent frailty, ignoring clustering—useful for overdispersion but not ideal for recurrent data (may lead to convergence issues or poor identifiability).


In [None]:
%%R
# Create unique ID per event/observation
bladder1$unique_id <- 1:nrow(bladder1)
# Fit with Gaussian frailty (per-observation random effect)
indiv_frailty <- coxph(Surv(start, stop, status) ~ treatment+ number + size + frailty(unique_id, dist = "gauss"), 
                       data = bladder1)
# Summary
summary(indiv_frailty)

In [None]:
%%R
# Test for frailty significance (LRT vs. no-frailty model)
anova(cox_no_frailty, indiv_frailty)
# If p < 0.05, evidence of individual-level heterogeneity.

### Fit a Shared Frailty Model


This is the appropriate model for recurrent events: frailty shared across events for each patient (cluster = id), using gamma distribution (default).


In [None]:
%%R
# Fit shared frailty (gamma distribution)
shared_frailty_gamma <- coxph(Surv(start, stop, status) ~ treatment + number + size + frailty(id), 
                              data = bladder1)
# Summary
summary(shared_frailty_gamma)


Alternative: Gaussian frailty (log-normal)::



In [None]:
%%R
# Alternative: Gaussian frailty (log-normal)
shared_frailty_gauss <- coxph(Surv(start, stop, status) ~ treatment + number + size + frailty(id, dist = "gauss"), 
                              data = bladder1)
summary(shared_frailty_gauss)  # Similar, but variance on log-scale.

In [None]:
%%R
# Predict or plot (e.g., baseline hazard)
plot(survfit(shared_frailty_gamma), xlab = "Time", ylab = "Survival")

### Advanced Options with frailtypack


For more flexibility (e.g., parametric baseline, joint models), use `frailtypack`. Install: `install.packages("frailtypack")`.



In [None]:
%%R
# Identify problematic rows
invalid_intervals <- bladder1[bladder1$stop <= bladder1$start, ]
invalid_status <- bladder1[!bladder1$status %in% c(0, 1), ]

# Remove invalid rows
bladder1_clean <- bladder1 %>%
  filter(stop > start, status %in% c(0, 1))

# Create gaptime for PWP-GT
bladder1_clean <- bladder1_clean %>%
  group_by(id) %>%
  mutate(gaptime = stop - start) %>%
  ungroup()

# Truncate to first 4 events for PWP models
bladder_trunc <- bladder1_clean[bladder1_clean$enum <= 4, ]

# Verify
head(bladder_trunc)

In [None]:
%%R
library(frailtypack)

# Shared frailty with splines baseline
shared_pack <- frailtyPenal(
  Surv(start, stop, status) ~ cluster(id) + treatment + number + size,
  data = bladder1_clean,
  n.knots = 8,
  kappa = 1e5,
  hazard = "Splines"
)

# Summary
print(shared_pack)


- **No Frailty**: Assumes independence; may underestimate SEs.

- **Individual Frailty**: Captures per-event variation but ignores correlation; variance $\theta$ small if little heterogeneity.

- **Shared Frailty**: Accounts for patient-level correlation; $\theta > 0$ suggests unobserved patient factors affect recurrence risk. Treatment (rx) reduces hazard by ~40% (exp(-0.51) ≈ 0.6), initial tumors increase risk.


### Using `coxme` for Mixed-Effects Cox Models

In [None]:
%%R
library(coxme)
# Fit mixed-effects Cox with frailty (random intercept by id)
coxme_frailty <- coxme(Surv(start, stop, status) ~ treatment + number + size + (1 | id), 
                       data = bladder1)
summary(coxme_frailty)

`Interpretation`: Fixed effects similar to standard Cox, but SEs account for frailty. Random effect variance (e.g., 0.45) measures patient-level heterogeneity; higher variance means stronger clustering.

`Convergence`: coxme may take longer; use control = list(optimizer = "bobyqa") if issues arise.



#### Test for Frailty Significance

In [None]:
%%R
# Or manual:
loglik_null <- logLik(cox_no_frailty)[1]
loglik_frail <- logLik(coxme_frailty)[1]
lrt_stat <- 2 * (loglik_frail - loglik_null)
p_value <- pchisq(lrt_stat, df = 1, lower.tail = FALSE)  # Or use 0.5 df mixture for exact
p_value 
# Example: If lrt_stat ~ 10, p < 0.001, significant frailty.

### Summary


Frailty models extend traditional survival analysis methods, such as the Cox proportional hazards model, by incorporating a random effect (frailty) to account for unobserved heterogeneity and clustering in time-to-event data. This heterogeneity arises from unmeasured factors that influence the hazard rate, leading to biased estimates if ignored. Key types include individual frailty models, which assign unique frailties to each subject or observation to capture personal-level variation (though less common and harder to identify in non-clustered data), and shared frailty models, which apply a common frailty within clusters (e.g., families) or across recurrent events for the same individual, inducing dependence and better handling correlated outcomes like repeated tumor recurrences.

In the R tutorial using the `bladder1` dataset from the `survival` package, we analyzed recurrent bladder cancer events in 85 patients. The dataset, in counting-process format, includes covariates like treatment (`treatment`), initial tumor number, and size. We fitted a standard Cox model (ignoring clustering), an individual frailty model (using Gaussian distribution per observation for illustrative overdispersion), and a shared frailty model (gamma or Gaussian, clustered by patient ID). Results showed significant frailty variance in the shared model, indicating unobserved patient-specific factors, with treatment reducing hazard and initial tumors increasing it. Advanced fitting was demonstrated with `frailtypack` for penalized likelihood and splines.


Frailty models are essential for robust survival analysis in the presence of unobserved heterogeneity or dependent events, preventing underestimation of variability and improving model fit, as seen in the bladder cancer example where shared frailty revealed clustering effects. They enable more accurate inference in fields like medicine, epidemiology, and reliability engineering, though assumptions about frailty distribution (e.g., gamma for constant dependence) must be carefully chosen and tested. In practice, shared frailty is particularly valuable for recurrent or clustered data, while individual frailty suits broader heterogeneity exploration. Overall, integrating frailty enhances interpretability, such as through selection effects where frailer individuals exit early, leaving robust survivors, and supports better decision-making in risk assessment.


## Resources 


- **Book: Frailty Models in Survival Analysis** by Andreas Wienke (2010). This comprehensive text covers univariate and multivariate frailty models, with emphasis on real-data applications and statistical techniques.
  
- **Tutorial Paper: A Tutorial on Frailty Models** by Theodor A. Balan and Hein Putter (2020). An accessible guide illustrating frailty concepts, selection effects, and implementation for survival outcomes.

- **Book: Applied Survival Analysis Using R** by Dirk F. Moore (2016). Focuses on practical survival analysis in R, including frailty models, with code examples and integration of packages like `survival`.

- **R Package Documentation**: 
  - `survival` package vignette on frailty models (available via `vignette("frailty", package="survival")` in R).
  - `frailtypack` package on CRAN: Provides advanced tools for frailty models, including penalized and joint models (https://cran.r-project.org/web/packages/frailtypack/index.html).
  - `coxme` package for mixed-effects Cox models with frailty (https://cran.r-project.org/web/packages/coxme/index.html).



