![All-test](http://drive.google.com/uc?export=view&id=1bLQ3nhDbZrCCqy_WCxxckOne2lgVvn3l)

# 2. Semi-parametric Survival Analysis {.unnumbered}


Semi-parametric survival analysis methods, such as the Cox proportional hazards model and its extensions, provide flexible tools for modeling time-to-event data without making strong parametric assumptions about the baseline hazard function. In this section, we will introduce the key concepts, mathematical formulations, and practical applications of these semi-parametric methods.


## Overview


**Semi-parametric methods** combine elements of both parametric and non-parametric approaches in statistical modeling. They make **partial assumptions** about the data structure:

- **Parametric component**: Specifies the relationship between covariates and the outcome (e.g., linear relationship, proportional hazards)  
- **Non-parametric component**: Makes minimal or no assumptions about the underlying distribution of certain aspects (e.g., baseline hazard function, error distribution)

This hybrid approach offers **flexibility** (avoiding strong distributional assumptions) while maintaining **interpretability** and **efficiency** of parametric models.



## Cox Proportional Hazards Model

### Basic Concept  


The **Cox Proportional Hazards Model** is the most widely used semi-parametric survival model. It models the hazard function without specifying the baseline hazard form.


### Mathematical Formulation  


$$
h(t \mid \mathbf{X}) = h_0(t) \cdot \exp(\beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p)
$$

Where:  
- $h(t \mid \mathbf{X})$ = hazard at time $t$ given covariates $\mathbf{X}$  
- $h_0(t)$ = baseline hazard function (unspecified, non-parametric)  
- $\exp(\boldsymbol{\beta}^\top \mathbf{X})$ = parametric component representing covariate effects  


### Key Assumptions  


1. **Proportional Hazards**: Hazard ratios between any two individuals are constant over time  
2. **Log-linear relationship** between covariates and log-hazard  


### Estimation  


Uses **partial likelihood** (Cox, 1972) that eliminates the need to estimate $h_0(t)$:  

$$
L(\boldsymbol{\beta}) = \prod_{i: \, \text{event at } t_i} \frac{\exp(\boldsymbol{\beta}^\top \mathbf{X}_i)}{\sum_{j \in \mathcal{R}(t_i)} \exp(\boldsymbol{\beta}^\top \mathbf{X}_j)}
$$
Where $\mathcal{R}(t_i)$ is the risk set at time $t_i$ (all individuals still at risk just before $t_i$).



## Time-Dependent Cox Model

### Purpose  


Extends the basic Cox model to handle **time-varying covariates** variables whose values change over time.


### Mathematical Formulation  


$$
h(t \mid \mathbf{X}(t)) = h_0(t) \cdot \exp\big(\beta_1 X_1(t) + \beta_2 X_2(t) + \cdots + \beta_p X_p(t)\big)
$$


### Key Features  


- Covariates $\mathbf{X}(t)$ can change value during follow-up  

- **Still assumes proportional hazards** for the time-varying effects  
- Requires data restructuring into **start-stop format**


### Example Applications  


- **Treatment changes**: Drug dosage adjustments over time  
- **Biomarker evolution**: CD4 count in HIV patients  
- **Time-dependent exposures**: Employment or marital status  


### Data Structure Example  


| Patient | Start | Stop | Event | Covariate |
|--------|-------|------|-------|-----------|
| 1      | 0     | 6    | 0     | 10        |
| 1      | 6     | 12   | 1     | 15        |
| 2      | 0     | 8    | 0     | 8         |
| 2      | 8     | 15   | 0     | 12        |




## Stratified Cox Model

### Purpose  


Addresses violations of the **proportional hazards assumption** by allowing the baseline hazard to differ across strata while maintaining common covariate effects.


### Mathematical Formulation  


$$
h_g(t \mid \mathbf{X}) = h_{0g}(t) \cdot \exp(\beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p)
$$

Where:  
- $g = 1, 2, \dots, G$ indexes the stratum  
- $h_{0g}(t)$ = baseline hazard specific to stratum $g$  
- **Same $\boldsymbol{\beta}$ coefficients** across all strata  


### When to Use  

- **Non-proportional hazards** for a particular variable  
- **Different baseline risks** across groups (e.g., hospitals, age categories)  
- **Matching** in case-control studies  


### Key Characteristics  

- **No assumption** about the relationship between $h_{0g}(t)$ across strata  
- Covariate effects ($\boldsymbol{\beta}$) are **assumed identical** across strata  
- Cannot estimate the effect of the stratification variable itself  




## Key Differences Between Models


| Feature | Basic Cox | Time-Dependent Cox | Stratified Cox |
|--------|-----------|--------------------|----------------|
| **Baseline Hazard** | Single $h_0(t)$ | Single $h_0(t)$ | Multiple $h_{0g}(t)$ |
| **Covariates** | Fixed over time | Can vary over time | Fixed over time |
| **PH Assumption** | Required for all covariates | Required for time-varying effects | Required *within* strata only |
| **Primary Purpose** | Standard survival analysis | Handle time-varying exposures | Address PH violations |
| **Effect Estimation** | Single $\boldsymbol{\beta}$ | Single $\boldsymbol{\beta}$ for time-varying covariates | Single $\boldsymbol{\beta}$ across strata |
| **Data Structure** | Standard survival format | Start-stop format required | Standard format + strata indicator |



## Practical Considerations

### Model Selection Guidelines  

1. **Start with basic Cox model**  
2. **Test proportional hazards assumption** (e.g., Schoenfeld residuals)  
3. **If PH violated**:  
   - Use **stratified model** if due to a categorical variable  
   - Use **time-dependent coefficients** (e.g., $\beta(t)X$) if a continuous covariate violates PH  
4. **If covariates change over time**: Use **time-dependent Cox model**


### Advantages of Semi-Parametric Approach  

- **Robustness**: No need to specify $h_0(t)$  
- **Efficiency**: More efficient than fully non-parametric methods  
- **Interpretability**: Hazard ratios have clear clinical meaning  
- **Flexibility**: Adaptable to complex covariate structures  


### Limitations  

- **PH assumption** can be restrictive  
- **Cannot estimate absolute survival probabilities** without estimating $h_0(t)$  
- **Interpretation complexity** with time-dependent covariates  
- **Increased computational burden** with model extensions  


## Summary and Conclusion


Semi-parametric survival analysis methods, particularly the Cox proportional hazards model and its extensions, are essential tools for analyzing time-to-event data. They strike a balance between flexibility and interpretability by avoiding strong parametric assumptions about the baseline hazard function while allowing for meaningful covariate effects. Next , we will explore practical implementations of these models using R and Python, including data preparation, model fitting, diagnostics, and interpretation of results.


## Resources