# Notebook 4: XGBoost Accelerated Failure Time (AFT)

## Model Overview

XGBoost AFT is a gradient boosting model for survival analysis:
- **Objective**: `survival:aft` (Accelerated Failure Time)
- **Loss**: Negative log-likelihood of AFT distribution
- **Handles censoring**: Via interval regression (lower/upper bounds)
- **Handles NaN**: Native missing value handling

## Configuration
- **Features**: 83 unfixed (XGBoost handles NaN natively)
- **Distribution**: Normal/Logistic/Extreme
- **Tuning**: Optuna (100 trials)
- **Evaluation**: `concordance_index_ipcw` from sksurv
- **CV Score**: 0.6964 weighted C-index (BEST single model)

In [None]:
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import StratifiedKFold
from sksurv.metrics import concordance_index_ipcw
from sksurv.util import Surv
import optuna
from optuna.samplers import TPESampler

# Paths
TRAIN_PATH = '/your_path/SurvivalPrediction/data'

## 1. Load Data

In [48]:
# Load 83-feature UNFIXED dataset (XGBoost handles NaN)
X_train_with_id = pd.read_csv(f'{TRAIN_PATH}/X_train_83features_with_id.csv')
target = pd.read_csv(f'{TRAIN_PATH}/target_train_clean_aligned.csv')

# Align to target
X_train_with_id = X_train_with_id.set_index('ID').loc[target['ID']].reset_index()
X_train = X_train_with_id.drop(columns=['ID'])

y_time = target['OS_YEARS'].values
y_event = target['OS_STATUS'].values.astype(bool)
n_samples = len(X_train)

# Create structured array for IPCW C-index
y_surv = Surv.from_arrays(event=y_event, time=y_time)

print(f"Features: {X_train.shape[1]}")
print(f"Samples: {n_samples}")
print(f"Events: {y_event.sum()} ({y_event.mean()*100:.1f}%)")
print(f"NaN values: {X_train.isna().sum().sum()} (XGBoost handles these natively)")

Features: 83
Samples: 3120
Events: 1600 (51.3%)
NaN values: 240 (XGBoost handles these natively)


In [49]:
# Risk groups for weighted C-index
def define_risk_groups(X):
    risk_factors = pd.DataFrame(index=X.index)
    risk_factors['high_blast'] = (X['BM_BLAST'] > 10).astype(int)
    risk_factors['has_TP53'] = (X['has_TP53'] > 0).astype(int)
    risk_factors['low_hb'] = (X['HB'] < 10).astype(int)
    risk_factors['low_plt'] = (X['PLT'] < 50).astype(int)
    risk_factors['high_cyto'] = (X['cyto_risk_score'] >= 3).astype(int)
    n_risk_factors = risk_factors.sum(axis=1)
    return {
        'test_like': n_risk_factors >= 1,
        'high_risk': n_risk_factors >= 2,
    }

risk_groups = define_risk_groups(X_train)

# Stratification variable for CV
has_tp53 = (X_train['has_TP53'] > 0).astype(int).values
strat_var = pd.Series([f"{int(e)}_{int(t)}" for e, t in zip(y_event, has_tp53)])

print(f"Test-like subgroup: {risk_groups['test_like'].sum()} samples")
print(f"High-risk subgroup: {risk_groups['high_risk'].sum()} samples")

Test-like subgroup: 2192 samples
High-risk subgroup: 772 samples


## 2. AFT Model Explanation

### Accelerated Failure Time Model

AFT models assume: log(T) = f(X) + σε

Where:
- T = survival time
- f(X) = XGBoost prediction (tree ensemble)
- σ = scale parameter
- ε = error term from specified distribution

### Distribution Options
- **normal**: ε ~ Normal(0,1) → log-normal survival times
- **logistic**: ε ~ Logistic(0,1) → log-logistic survival times  
- **extreme**: ε ~ Extreme Value → Weibull survival times

### Handling Censoring
XGBoost AFT uses interval censoring:
- **Events**: lower_bound = upper_bound = observed_time
- **Censored**: lower_bound = observed_time, upper_bound = +∞

In [50]:
# Prepare AFT labels
y_lower = y_time.copy()
y_upper = np.where(y_event, y_time, np.inf)  # Censored → upper bound is infinity

print("AFT Label Bounds:")
print(f"  Events: lower=upper (exact time)")
print(f"  Censored: lower=time, upper=inf (right censored)")
print(f"\n  Events: {(y_upper != np.inf).sum()}")
print(f"  Censored: {(y_upper == np.inf).sum()}")

AFT Label Bounds:
  Events: lower=upper (exact time)
  Censored: lower=time, upper=inf (right censored)

  Events: 1600
  Censored: 1520


## 3. Evaluation Metric

Use `concordance_index_ipcw` from scikit-survival which:
- Handles censoring via Inverse Probability of Censoring Weighting (IPCW)
- Uses a time truncation parameter τ (tau=7.0 years)

Weighted formula: 0.3 × overall + 0.4 × test_like + 0.3 × high_risk

In [51]:
def weighted_cindex_ipcw(risk, y_surv_all, risk_groups, tau=7.0):
    """
    Compute weighted C-index using concordance_index_ipcw (competition metric).
    
    Args:
        risk: Risk scores (higher = worse prognosis)
        y_surv_all: Structured survival array (event, time)
        risk_groups: Dict with 'test_like' and 'high_risk' boolean masks
        tau: Time truncation for IPCW
    
    Returns:
        Dict with overall, test_like, high_risk, and weighted C-indices
    """
    # Overall C-index
    c_overall = concordance_index_ipcw(y_surv_all, y_surv_all, risk, tau=tau)[0]

    # Test-like subgroup
    mask_test = risk_groups['test_like'].values
    y_surv_test = Surv.from_arrays(event=y_surv_all['event'][mask_test], time=y_surv_all['time'][mask_test])
    c_test = concordance_index_ipcw(y_surv_all, y_surv_test, risk[mask_test], tau=tau)[0]

    # High-risk subgroup
    mask_high = risk_groups['high_risk'].values
    y_surv_high = Surv.from_arrays(event=y_surv_all['event'][mask_high], time=y_surv_all['time'][mask_high])
    c_high = concordance_index_ipcw(y_surv_all, y_surv_high, risk[mask_high], tau=tau)[0]

    weighted = 0.3 * c_overall + 0.4 * c_test + 0.3 * c_high

    return {
        'overall': c_overall,
        'test_like': c_test,
        'high_risk': c_high,
        'weighted': weighted
    }

## 4. XGBoost AFT Training

In [52]:
def train_xgb_aft(X_tr, y_lower_tr, y_upper_tr, params, n_estimators):
    """
    Train XGBoost AFT model.
    
    Args:
        X_tr: Training features
        y_lower_tr: Lower bound of survival time
        y_upper_tr: Upper bound (inf for censored)
        params: XGBoost parameters
        n_estimators: Number of boosting rounds
    
    Returns:
        Trained XGBoost model
    """
    dtrain = xgb.DMatrix(X_tr)
    dtrain.set_float_info('label_lower_bound', y_lower_tr) # Minimum possible event time
    dtrain.set_float_info('label_upper_bound', y_upper_tr) # Maximum possible event time
    
    model = xgb.train(params, dtrain, num_boost_round=n_estimators, verbose_eval=False)
    return model

def predict_xgb_aft(model, X):
    """Predict risk score (negative predicted survival time)."""
    dtest = xgb.DMatrix(X)
    pred_time = model.predict(dtest)
    # Higher predicted time = lower risk, so negate
    return -pred_time

## 5. Global OOF Evaluation

**Global OOF (Out-Of-Fold)** evaluation:
1. Split data into K folds
2. For each fold, train on K-1 folds, predict on held-out fold
3. Collect all OOF predictions
4. Compute **single** C-index on all 3120 OOF predictions

This is more robust than averaging per-fold C-indices.

In [53]:
def global_oof_evaluate(params, n_splits=5, seed=42):
    """Global OOF Cross-validation evaluation for XGBoost AFT."""
    skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=seed)
    oof_preds = np.zeros(n_samples)
    
    X_arr = X_train.values
    n_estimators = params.get('n_estimators', 100)
    
    xgb_params = {
        'objective': 'survival:aft',
        'eval_metric': 'aft-nloglik',
        'aft_loss_distribution': params['aft_distribution'],
        'aft_loss_distribution_scale': 1.0,
        'tree_method': 'hist',
        'max_depth': params['max_depth'],
        'learning_rate': params['learning_rate'],
        'min_child_weight': params['min_child_weight'],
        'subsample': params['subsample'],
        'colsample_bytree': params['colsample_bytree'],
        'gamma': params['gamma'],
        'reg_alpha': params['reg_alpha'],
        'reg_lambda': params['reg_lambda'],
        'seed': seed,
    }
    
    for fold_idx, (train_idx, val_idx) in enumerate(skf.split(X_arr, strat_var)):
        X_tr, X_val = X_arr[train_idx], X_arr[val_idx]
        y_lower_tr = y_lower[train_idx]
        y_upper_tr = y_upper[train_idx]
        
        xgb_params['seed'] = seed + fold_idx
        model = train_xgb_aft(X_tr, y_lower_tr, y_upper_tr, xgb_params, n_estimators)
        oof_preds[val_idx] = predict_xgb_aft(model, X_val)
    
    # Global Z-score normalization
    oof_normalized = (oof_preds - oof_preds.mean()) / (oof_preds.std() + 1e-8)
    
    # Compute metrics using competition metric
    return weighted_cindex_ipcw(oof_normalized, y_surv, risk_groups)

## 6. Hyperparameter Tuning with Optuna

In [43]:
def objective(trial):
    """Optuna objective for XGBoost AFT hyperparameter tuning."""
    params = {
        'aft_distribution': trial.suggest_categorical('aft_distribution', ['normal', 'logistic', 'extreme']),
        'n_estimators': trial.suggest_int('n_estimators', 50, 500),
        'max_depth': trial.suggest_int('max_depth', 2, 8),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'min_child_weight': trial.suggest_int('min_child_weight', 10, 200),
        'subsample': trial.suggest_float('subsample', 0.5, 1.0),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0),
        'gamma': trial.suggest_float('gamma', 0, 5),
        'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10, log=True),
        'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10, log=True),
    }
    
    result = global_oof_evaluate(params)
    return result['weighted']

# Run Optuna study (reduced trials for notebook)
print("Running Optuna hyperparameter tuning...")
print("(Set n_trials=100 for full tuning)\n")

sampler = TPESampler(seed=42)
study = optuna.create_study(direction='maximize', sampler=sampler)
study.optimize(objective, n_trials=100, show_progress_bar=True)  # Use 100 for full tuning

print(f"\nBest trial:")
print(f"  Weighted C-index: {study.best_value:.4f}")
print(f"  Params: {study.best_params}")

Running Optuna hyperparameter tuning...
(Set n_trials=100 for full tuning)



  0%|          | 0/100 [00:00<?, ?it/s]


Best trial:
  Weighted C-index: 0.6964
  Params: {'aft_distribution': 'normal', 'n_estimators': 147, 'max_depth': 5, 'learning_rate': 0.026341881840794876, 'min_child_weight': 41, 'subsample': 0.9204349911732258, 'colsample_bytree': 0.5136953028400747, 'gamma': 2.8297130955076955, 'reg_alpha': 0.09570260464777246, 'reg_lambda': 0.4468062524655007}


## 7. Best Model Configuration

From full 100-trial Optuna tuning with competition metric (`concordance_index_ipcw`):

In [54]:
# Best hyperparameters from full tuning (competition metric)
BEST_PARAMS = {
    'aft_distribution': 'normal',
    'n_estimators': 147,
    'max_depth': 5,
    'learning_rate': 0.026342,
    'min_child_weight': 41,
    'subsample': 0.920435,
    'colsample_bytree': 0.513695,
    'gamma': 2.829713,
    'reg_alpha': 0.095703,
    'reg_lambda': 0.446806,
}

print("Best Hyperparameters (from 100-trial tuning, competition metric):")
for k, v in BEST_PARAMS.items():
    print(f"  {k}: {v}")

# Evaluate with best params
result = global_oof_evaluate(BEST_PARAMS)
print(f"\nCV Results (concordance_index_ipcw):")
print(f"  Overall C-index: {result['overall']:.4f}")
print(f"  Test-like C-index: {result['test_like']:.4f}")
print(f"  High-risk C-index: {result['high_risk']:.4f}")
print(f"  Weighted C-index: {result['weighted']:.4f}")

Best Hyperparameters (from 100-trial tuning, competition metric):
  aft_distribution: normal
  n_estimators: 147
  max_depth: 5
  learning_rate: 0.026342
  min_child_weight: 41
  subsample: 0.920435
  colsample_bytree: 0.513695
  gamma: 2.829713
  reg_alpha: 0.095703
  reg_lambda: 0.446806

CV Results (concordance_index_ipcw):
  Overall C-index: 0.7214
  Test-like C-index: 0.6967
  High-risk C-index: 0.6709
  Weighted C-index: 0.6964


## 8. Train Final Model and Generate Predictions

In [32]:
# Train on full data
print("Training final model on full data...")

xgb_params = {
    'objective': 'survival:aft',
    'eval_metric': 'aft-nloglik',
    'aft_loss_distribution': BEST_PARAMS['aft_distribution'],
    'aft_loss_distribution_scale': 1.0,
    'tree_method': 'hist',
    'max_depth': BEST_PARAMS['max_depth'],
    'learning_rate': BEST_PARAMS['learning_rate'],
    'min_child_weight': BEST_PARAMS['min_child_weight'],
    'subsample': BEST_PARAMS['subsample'],
    'colsample_bytree': BEST_PARAMS['colsample_bytree'],
    'gamma': BEST_PARAMS['gamma'],
    'reg_alpha': BEST_PARAMS['reg_alpha'],
    'reg_lambda': BEST_PARAMS['reg_lambda'],
    'seed': 42,
}

final_model = train_xgb_aft(
    X_train.values, y_lower, y_upper,
    xgb_params, BEST_PARAMS['n_estimators']
)

print(f"Model trained with {BEST_PARAMS['n_estimators']} trees.")

Training final model on full data...
Model trained with 147 trees.


In [None]:
# Feature importance
importance = final_model.get_score(importance_type='gain')
importance_df = pd.DataFrame([
    {'feature': k, 'importance': v} for k, v in importance.items()
]).sort_values('importance', ascending=False)

print("Top 20 Features by Gain:")
print(importance_df.head(20))

Top 20 Features by Gain:
   feature  importance
13     f16  102.165848
45     f80   73.549072
11     f14   68.856689
24     f28   41.682056
37     f72   29.986715
0       f0   28.927471
10     f13   24.852905
23     f27   23.491613
43     f78   20.367319
3       f3   19.378695
4       f4   17.108749
7      f10   16.551325
9      f12   14.190485
41     f76   13.537127
44     f79   13.203796
19     f23   12.672067
25     f29   12.590256
40     f75   12.580143
47     f82   12.473783
15     f19   12.048156


In [56]:
# Load test data and generate predictions
X_test_with_id = pd.read_csv(f'{TRAIN_PATH}/X_test_83features_with_id.csv')
test_ids = X_test_with_id['ID'].values
X_test = X_test_with_id.drop(columns=['ID']).values

# Predict
test_risk = predict_xgb_aft(final_model, X_test)

print(f"\nTest predictions:")
print(f"  Samples: {len(test_risk)}")
print(f"  Risk range: [{test_risk.min():.4f}, {test_risk.max():.4f}]")

# Create submission
submission = pd.DataFrame({
    'ID': test_ids,
    'risk_score': test_risk
})
# Save to outputs/submissions/
import os
os.makedirs('outputs/submissions', exist_ok=True)
output_path = 'outputs/submissions/submission_xgb_aft.csv'
submission.to_csv(output_path, index=False)


Test predictions:
  Samples: 1193
  Risk range: [-11.5427, -0.1924]


## Summary

### XGBoost AFT Model Results

| Metric | Value |
|--------|-------|
| Overall C-index | 0.7214 |
| Test-like C-index | 0.6967 |
| High-risk C-index | 0.6709 |
| **Weighted C-index** | **0.6964** |

### Key Findings
1. **Best single model** in our experiments
2. Normal (log-normal) distribution works best
3. Handles NaN natively - no imputation needed
4. 83 unfixed features outperform 128 fixed features

Public leaderboard score: 0.7534