<a href="https://colab.research.google.com/github/zia207/Survival_Analysis_R/blob/main/Colab_Notebook/02_07_04_00_survival_analysis_recurrent_event_models_r.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![All-test](http://drive.google.com/uc?export=view&id=1bLQ3nhDbZrCCqy_WCxxckOne2lgVvn3l)

# Recurrent Event Models


Recurrent event models are statistical techniques used in survival analysis to study situations where the same event can occur multiple times for an individual subject over the observation period. These models extend traditional survival methods to account for the dependency and correlation between repeated events, such as multiple asthma attacks, recurrent infections, or repeated hospitalizations.Common approaches include:


### Andersen-Gill (AG) Model


The AG model is an extension of the Cox PH model that treats recurrent events as independent observations within a subject, adjusted for intra-subject correlation using a robust variance estimator (clustering by subject ID). It assumes that the hazard for each event depends on covariates and the time since the study start (or a reset time).

-   **Key Features**:
    -   **Data Format**: Uses the counting process format `(tstart, tstop, event)`, where each row represents a time interval for a subject, with `tstart` and `tstop` defining the interval and `event` indicating whether an event occurred at `tstop` (1 = event, 0 = censored).
    -   **Assumption**: Baseline hazard is the same for all events within a subject, and events are conditionally independent given covariates. The robust variance accounts for correlation between events.
    -   **Use Case**: When events are frequent and correlation between events is not the primary focus (e.g., repeated asthma attacks).

-   **Hazard Function**:

$$
h_i(t) = h_0(t) \exp(\beta' X_i(t)),
$$

where $h_0(t)$ is the baseline hazard, $X_i(t)$ are covariates (possibly time-dependent), and  $\beta$) are the regression coefficients. The `cluster(id)` term adjusts standard errors for intra-subject correlation.

- **Strengths**: Flexible for time-dependent covariates; straightforward to implement.

- **Limitations**: Assumes the same baseline hazard for all events; may not fully capture event dependence.


### Prentice-Williams-Peterson (PWP) Model


The PWP model accounts for event order by stratifying the hazard function by event number (e.g., first event, second event) or conditioning on the time since the previous event. Two versions exist:

-   **PWP-Total Time (PWP-TT)**: Time is measured from the study start (like AG), but the hazard is stratified by event number.

-   **PWP-Gap Time (PWP-GT)**: Time is reset to zero after each event, modeling the hazard for the time to the next event (gap time).

-   **Key Features**:

    -   **Data Format**: Similar to AG, uses `(tstart, tstop, event)`, but includes a `stratum` variable indicating event number (e.g., 1 for first event, 2 for second).
    -   **Assumption**: The hazard function differs by event number (PWP-TT) or resets after each event (PWP-GT); accounts for event order explicitly.
    -   **Use Case**: When event order matters (e.g., first vs. subsequent cancer recurrences).

-   **Hazard Function**:

    -   PWP-TT:

$$
h_{ik}(t) = h_{0k}(t) \exp(\beta' X_i(t))
$$

where $h_{0k}(t)$ is the baseline hazard for the $k$-th event.

-   PWP-GT:

$$
h_{ik}(u) = h_{0k}(u) \exp(\beta' X_i(u))
$$

where

- $u$ = t
- $t_{k-1}$ is the gap time since the previous event.

-   **Strengths**: Explicitly models event order or gap times; better for ordered events with distinct hazards.
-   **Limitations**: Requires sufficient events per stratum; more complex to interpret than AG.


### Frailty models


Fraitlty models account for unobserved heterogeneity or clustering in survival data by introducing a random effect, termed "frailty." The frailty represents unobserved factors that influence the hazard of an event, such as unmeasured patient characteristics or clustering effects (e.g., patients within hospitals). These models are particularly useful for handling correlated survival times, such as in recurrent event data, clustered data (e.g., family studies), or multi-center trials.

-   **Hazard Function**:

$$
  h_{ij}(t) = z_j h_0(t) \exp(\beta' X_{ij}(t)),
$$
where:
- $h_0(t)$): Baseline hazard (unspecified in semi-parametric models like Cox).
- $z_j$: Frailty term for cluster $j$, typically $z_j \sim \text{Gamma}(1, \theta)$ or $z_j \sim \text{Log-Normal}(0, \sigma^2)$.
- $X_{ij}(t)$: Covariates (possibly time-dependent).
- $\beta$: Regression coefficients.

-   **Assumption**: Frailty terms are independent and identically distributed across clusters, capturing unobserved heterogeneity.

-   **Use Case**: Modeling recurrent events (e.g., repeated tumor recurrences), clustered data (e.g., patients within hospitals), or family studies where unobserved genetic factors influence survival

-   **Strengths**: Accounts for unobserved heterogeneity; models correlation within clusters.

-   **Limitations**: More complex estimation; requires assumptions about frailty distribution; interpretation of frailty effects can be challenging.


### Marginal Models (e.g., Wei-Lin-Weissfeld)


Marginal models, such as the Wei-Lin-Weissfeld (WLW) approach, analyze recurrent event data by treating each event type as a separate process while accounting for the correlation between events within the same subject. This method uses robust variance estimation to adjust for intra-subject correlation without explicitly modeling the dependence structure.

-   **Hazard Function**:

$$
h_{ik}(t) = h_0(t) \exp(\beta' X_i(t)),
$$

where:
- $h_0(t)$: Baseline hazard (common across events).
- $X_i(t)$: Covariates (possibly time-dependent).
- $\beta$: Regression coefficients. - **Key Features**: - **Data Format**: Similar to AG, uses `(tstart, tstop,  event)`, with each row representing a time interval for a subject.

- **Assumption**: Treats each event type as a separate process; uses robust variance to account for correlation.
- **Use Case**: When interested in overall effects of covariates on recurrent events without focusing on event order or gap times.
- **Strengths**: Simple to implement; flexible for time-dependent covariates.
- **Limitations**: May not capture event dependence as explicitly as PWP models.


### Choosing the Right Model


The choice of recurrent event model depends on the research question, data structure, and assumptions about event dependence. Key considerations include:

- **Event Dependence**: If event order matters, consider PWP models; if not , AG or marginal models may suffice.
- **Data Structure**: Ensure data is in the appropriate format (e.g., counting process format for AG and PWP).
- **Covariates**: Consider whether covariates are time-dependent and how they influence the hazard.


## Key Differences Between Models



Marginal models differ from the Andersen-Gill (AG), Prentice-Williams-Peterson (PWP), and frailty models in their handling of dependence, risk sets, interpretation of effects, and assumptions about event processes. Below is a summary of each model, followed by a table highlighting key differences.


| Aspect                  | Marginal Models (e.g., WLW) | Andersen-Gill (AG) | Prentice-Williams-Peterson (PWP) | Frailty Models |
|-------------------------|-----------------------------|--------------------|----------------------------------|----------------|
| **Focus/Interpretation** | Population-averaged effects on marginal rates/means | Conditional intensity given covariates | Conditional hazards stratified by event order | Subject-specific effects accounting for unobserved heterogeneity |
| **Handling Dependence** | Robust variance estimator (sandwich); no explicit modeling | Assumes dependence captured by time-dependent covariates; optional robust variance | Conditions on event history via stratification; assumes Markov process | Explicit via random effects (frailty); quantifies correlation |
| **Risk Set**            | Unconditional (all subjects at risk for each stratum, ignoring order) | Conditional on survival to t; non-stratified | Conditional on prior events; stratified by order | Conditional, with frailty adjusting baseline per subject |
| **Time Scale**          | Total time (since entry) | Total time | Total time or gap time (since last event) | Typically total time; can extend to gap |
| **Assumptions**         | Proportional hazards per stratum; unspecified dependence | Independence of events given covariates; common baseline | Event-specific baselines/effects; small risk sets in later strata | Frailty distribution (e.g., gamma); conditional independence given frailty |
| **When to Use**         | When dependence is complex/unknown; focus on overall rates (e.g., clinical trials with composites) | Simple cases assuming independence via covariates; no strong order effects | When effects vary by recurrence or history matters (e.g., immunity in infections) | Heterogeneous populations with unmeasured factors (e.g., varying susceptibility) |
| **Advantages**          | Robust to dependence misspecification; parsimonious | Straightforward; good power if assumptions hold | Handles changing effects/hazards; respects order | Quantifies heterogeneity; subject-specific insights |
| **Disadvantages**       | May overestimate in later strata; no order/heterogeneity insights | Less robust to unmeasured dependence or order | Instability with few events in later strata; requires truncation | Sensitive to frailty distribution; computationally intensive |







# Comparison with Survival Analysis with Time-Dependent Covariates


While both are extensions of survival analysis (often based on the Cox model) and can overlap—recurrent event models frequently incorporate time-dependent covariates—their primary distinctions lie in purpose, data handling, and assumptions. Below is a comparison:

| Aspect                  | Recurrent Event Models                                                                 | Survival Analysis with Time-Dependent Covariates                                      |
|-------------------------|----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
| **Primary Focus**      | Multiple occurrences of the same event per subject (e.g., repeated infections).       | Time to a single event, with predictors that change over time (e.g., varying blood pressure). |
| **Event Handling**     | Accounts for correlations and dependencies between repeated events using stratification, frailty, or counting processes. | Assumes independent or single events; does not inherently model recurrence.          |
| **Data Structure**     | Multiple rows per subject for each event interval; supports counting multiple events. | Multiple rows per subject for covariate change intervals; typically one event per subject. |
| **Common Models**      | Andersen-Gill, PWP-CP/GT, frailty, marginal models.                                    | Extended Cox PH model with time-varying covariates.                                   |
| **Assumptions**        | Handles intra-subject correlation (e.g., via clustering or random effects); events are dependent. | Proportional hazards with time-varying effects; events are independent across subjects. |
| **Applications**       | Longitudinal studies with repeated outcomes, like chronic diseases or hospitalizations. | Studies where predictors evolve, but the outcome is a one-time event (e.g., progression-free survival with changing biomarkers). |
| **Challenges**         | Model selection based on event dependence (e.g., PWP-CP for conditional risks); potential for bias if correlations ignored. | Avoiding bias from future-dependent covariates; interpretation complexity with varying effects. |
| **Overlap/Extensions** | Can include time-dependent covariates; uses similar interval encoding.                | Can be adapted for recurrent events by combining with recurrent models (e.g., AG model with time-varying predictors). |

In summary, recurrent event models are specialized for repeated events and their interdependencies, whereas survival analysis with time-dependent covariates emphasizes dynamic predictors for typically singular events. Choosing between them depends on whether the research question involves recurrence or time-varying influences.



## Summary and Key Takeaways


- Recurrent event models extend traditional survival analysis to handle multiple occurrences of the same event per subject, accounting for dependencies between events.
- Common recurrent event models include Andersen-Gill (AG), Prentice-Williams-Pet
erson (PWP), frailty models, and marginal models (e.g., Wei-Lin-Weissfeld).
- Each model has unique assumptions, data structures, and applications, making them suitable for different research
  contexts.
- Recurrent event models can incorporate time-dependent covariates, but their primary focus is on
  event recurrence rather than solely on changing predictors.
- Choosing the appropriate model depends on the research question, data characteristics, and assumptions about event dependence
- Understanding the distinctions between recurrent event models and survival analysis with time-dependent covariates is crucial for accurate modeling and interpretation in survival analysis studies.


##  Resources



1. **Book: "The Statistical Analysis of Recurrent Events" by Cook & Lawless (2007)**  
   - Covers PWP models with theory and R examples.  
   - Access: Springer or libraries (ISBN: 978-0-387-69809-0).

2. **R "survival" Package Vignette (CRAN)**  
   - Guides PWP model fitting with `coxph()` and datasets like `bladder`.  
   - Access: `vignette("survival")` or CRAN website.

3. **Article: Cook & Lawless (2002)**  
   - Reviews PWP-TT/GT models and applications.  
   - Access: PubMed or Sage Journals (DOI: 10.1191/0962280202sm295ra).

4. **UCLA Tutorial: Recurrent Event Analysis in R**  
   - Step-by-step R code for PWP models using `survival`.  
   - Access: https://stats.idre.ucla.edu/r/seminars/recurrent-events/.

5. **Book: "Survival Analysis" by Klein & Moeschberger (2003, Ch. 12)**  
   - Explains PWP models and counting processes.  
   - Access: Springer or libraries (ISBN: 978-0-387-95399-1).