# Question 1 — Baseline circulating T-cell biomarkers of survival (nivo/chemo)
This notebook performs median-split Kaplan–Meier + log-rank tests on baseline (C1D1) circulating T-cell populations in the **nivo/chemo** arm.

**Input files** (provided):
- `PICI0002_ph2_clinical.csv`
- `NatureMed_CyTOF_metadata.csv`
- `NatureMed_CyTOF_select_cell_populations.csv`


In [None]:
%pip install pandas numpy lifelines matplotlib

In [1]:
import pandas as pd
import numpy as np

from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test

import matplotlib.pyplot as plt


## Load data

In [2]:
clinical = pd.read_csv('PICI0002_ph2_clinical.csv')
cytof_long = pd.read_csv('NatureMed_CyTOF_select_cell_populations.csv')
meta = pd.read_csv('NatureMed_CyTOF_metadata.csv')

print('clinical', clinical.shape)
print('cytof_long', cytof_long.shape)
print('meta', meta.shape)

display(clinical.head())
display(cytof_long.head())
display(meta.head())

clinical (108, 29)
cytof_long (1633, 3)
meta (509, 4)


Unnamed: 0,Deidentified.ID,Age,Sex,Race,Ethnicity,Arm,Arm Description,Actual Arm,Phase,Participant Dosed,...,Prior Radiation,Prior Chemo,ECOG at Screening,Tobacco History,clinical.observation.os,clinical.observation.os.event,clinical.observation.pfs,clinical.observation.pfs.event,clinical.observation.pfs.reason,Best Overall Response
0,2,67,F,White,Not Hispanic or Latino,B2,B2: GEM/NP/APX005M 0.3 MG/KG,B2,PHASE 1B,Y,...,N,Y,1,Former,218,True,55,True,:clinical-observation.event-reason/progressed,PROGRESSIVE DISEASE
1,3,69,F,White,Not Hispanic or Latino,B2,B2: GEM/NP/APX005M 0.3 MG/KG,B2,PHASE 1B,Y,...,N,N,1,Never,966,False,351,False,:clinical-observation.event-reason/censored-st...,PARTIAL RESPONSE
2,4,54,M,Black or African American,Not Hispanic or Latino,B2,B2: GEM/NP/APX005M 0.3 MG/KG,B2,PHASE 1B,Y,...,N,N,1,Never,611,True,280,True,:clinical-observation.event-reason/progressed,STABLE DISEASE
3,6,64,M,White,Not Hispanic or Latino,B2,B2: GEM/NP/APX005M 0.3 MG/KG,B2,PHASE 1B,Y,...,N,Y,1,Former,675,True,168,True,:clinical-observation.event-reason/progressed,STABLE DISEASE
4,7,60,M,White,Not Hispanic or Latino,B2,B2: GEM/NP/APX005M 0.3 MG/KG,B2,PHASE 1B,Y,...,N,N,0,Former,919,False,512,True,:clinical-observation.event-reason/progressed,STABLE DISEASE


Unnamed: 0,sample.id,value,paper.name
0,PICI0002_A01_K01427KE01_SPB_A03,8.7e-05,CD1C+ CD141+ DC (% of leukocytes)
1,PICI0002_A16_K06268KE01_SPB_A01,6.7e-05,CD1C+ CD141+ DC (% of leukocytes)
2,BK00972EB02,6.2e-05,CD1C+ CD141+ DC (% of leukocytes)
3,BK00971EB01,2.4e-05,CD1C+ CD141+ DC (% of leukocytes)
4,BK00975EB01,2.9e-05,CD1C+ CD141+ DC (% of leukocytes)


Unnamed: 0,Deidentified.ID,sample.id,ms,timepoint.id
0,2,PICI0002_A05_K01609KE01_SPB_A01,Blood: CyTOF %leuk,C1D15
1,2,PICI0002_A01_K00891KE01_SPB_A01,Blood: CyTOF %par,C1D1
2,2,PICI0002_A01_K00891KE01_SPB_A01,Blood: CyTOF %leuk,C1D1
3,2,PICI0002_A06_K01343KE01_SPB_A01,Blood: CyTOF %leuk,C2D1
4,3,PICI0002_A06_K00943CP01_SPB_A02,Blood: CyTOF %leuk,C2D1


## Identify columns to use
Expecting:
- patient identifier: `Deidentified.ID`
- treatment arm: `Arm` (A1 = nivo/chemo)
- survival time/event: `clinical.observation.os`, `clinical.observation.os.event`
- CyTOF sample identifier: `sample.id`
- CyTOF feature name: `paper.name`
- CyTOF frequency value: `value`
- timepoint: `timepoint.id` (baseline is `C1D1`)


In [3]:
print('Clinical columns:', list(clinical.columns))
print('CyTOF columns:', list(cytof_long.columns))
print('Meta columns:', list(meta.columns))

print('Clinical arms:', clinical['Arm Description'].dropna().unique())
print('Meta timepoints:', sorted(meta['timepoint.id'].unique()))
print('CyTOF features (paper.name):')
print(sorted(cytof_long['paper.name'].unique()))

Clinical columns: ['Deidentified.ID', 'Age', 'Sex', 'Race', 'Ethnicity', 'Arm', 'Arm Description', 'Actual Arm', 'Phase', 'Participant Dosed', 'Received APX005M', 'Received Nivolumab', 'DLT Evaluable', 'Efficacy Population Flag', 'Cancer Type', 'Cancer Location', 'Stage at Initial Diagnosis', 'Stage at Enrollment', 'Prior Cancer Surgery', 'Prior Radiation', 'Prior Chemo', 'ECOG at Screening', 'Tobacco History', 'clinical.observation.os', 'clinical.observation.os.event', 'clinical.observation.pfs', 'clinical.observation.pfs.event', 'clinical.observation.pfs.reason', 'Best Overall Response']
CyTOF columns: ['sample.id', 'value', 'paper.name']
Meta columns: ['Deidentified.ID', 'sample.id', 'ms', 'timepoint.id']
Clinical arms: ['B2: GEM/NP/APX005M 0.3 MG/KG' 'C2: GEM/NP/NIVOLUMAB/APX005M 0.3 MG/KG'
 'PHASE II A1: GEM/NP/NIVOLUMAB'
 'PHASE II C2: GEM/NP/NIVOLUMAB/APX005M 0.3 MG/KG'
 'PHASE II B2: GEM/NP/APX005M 0.3 MG/KG']
Meta timepoints: ['C1D1', 'C1D15', 'C2D1', 'C4D1']
CyTOF features (p

## Reshape CyTOF to wide format (one row per sample)
The CyTOF file is in long format (sample × feature). Pivot to wide so each feature becomes a column.

In [4]:
cytof_wide = cytof_long.pivot_table(index='sample.id', columns='paper.name', values='value', aggfunc='first').reset_index()
print('cytof_wide', cytof_wide.shape)
display(cytof_wide)

cytof_wide (263, 17)


paper.name,sample.id,CCR7+ CD11b+ CD27- B cells,CD14+ HLA-DRlo m-MDSC (% of leukocytes),CD141+ Cross Presenting DC,CD1C+ CD141+ DC (% of leukocytes),CD1c-CD141+ Cross Presenting DC,CD40+ pDC (% of pDC),Conventional DC,HLA-DR+ CCR7+ B cells (% of leukocytes),HLA-DR+ Non-Naive CD4 T Cells,HLA-DR+ Non-Naive CD8 T Cells,HLA-DR+ Plasmablasts (% of Plasmablasts),HLA-DR+ T cells (% of leukocytes),Ki-67+ T cells (% of CD3+ cells),NKT cells (% of leukocytes),Tbet+ T cells (% of CD3+ cells),Tbet+ TCRgd+ T cells (% of TCRgd T cells)
0,BK00033EB01,0.000810,,,,0.000137,,,,0.009640,0.010560,,,,,,
1,BK00052EB02,0.000197,,0.000229,,,,,,0.104875,0.142081,,,,,,
2,BK00072EB02,0.000477,,0.000180,,0.000090,,,,0.099729,0.153511,,,,,,
3,BK00083EB01,0.000652,,,,0.000065,,,,0.078415,0.142845,,,,,,
4,BK00092EB01,0.000298,0.089852,,0.000014,,0.000003,,0.059357,0.054455,0.052884,0.000225,0.028703,0.874194,0.044607,0.203980,0.847944
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
258,PICI0002_A06_K01818KE01_SPB_A03,0.028235,,,,,,0.011264,,0.116619,0.136009,,,,,,
259,PICI0002_A06_K06384KE01_SPB_A01,0.002927,,,,,,0.002560,,0.130154,0.259506,,,,,,
260,PICI0002_A11_K01942KE01_SPB_A02,0.000700,,,,,,,,0.152146,0.198609,,,,,,
261,PICI0002_A16_K06191KE01_SPB_A02,,,,,,0.000051,,,,,0.000147,0.066540,0.942225,0.009655,0.299116,0.564482


## Merge CyTOF with metadata (adds patient + timepoint)


In [6]:
cytof_full = cytof_wide.merge(meta, on='sample.id', how='left')
print('cytof_full', cytof_full.shape)

# Check merge completeness
print('Missing Deidentified.ID:', cytof_full['Deidentified.ID'].isna().sum())
print('Missing timepoint.id:', cytof_full['timepoint.id'].isna().sum())

display(cytof_full)

cytof_full (509, 20)
Missing Deidentified.ID: 0
Missing timepoint.id: 0


Unnamed: 0,sample.id,CCR7+ CD11b+ CD27- B cells,CD14+ HLA-DRlo m-MDSC (% of leukocytes),CD141+ Cross Presenting DC,CD1C+ CD141+ DC (% of leukocytes),CD1c-CD141+ Cross Presenting DC,CD40+ pDC (% of pDC),Conventional DC,HLA-DR+ CCR7+ B cells (% of leukocytes),HLA-DR+ Non-Naive CD4 T Cells,HLA-DR+ Non-Naive CD8 T Cells,HLA-DR+ Plasmablasts (% of Plasmablasts),HLA-DR+ T cells (% of leukocytes),Ki-67+ T cells (% of CD3+ cells),NKT cells (% of leukocytes),Tbet+ T cells (% of CD3+ cells),Tbet+ TCRgd+ T cells (% of TCRgd T cells),Deidentified.ID,ms,timepoint.id
0,BK00033EB01,0.000810,,,,0.000137,,,,0.009640,0.010560,,,,,,,92,Blood: CyTOF %par,C1D15
1,BK00033EB01,0.000810,,,,0.000137,,,,0.009640,0.010560,,,,,,,92,Blood: CyTOF %leuk,C1D15
2,BK00052EB02,0.000197,,0.000229,,,,,,0.104875,0.142081,,,,,,,96,Blood: CyTOF %leuk,C1D15
3,BK00052EB02,0.000197,,0.000229,,,,,,0.104875,0.142081,,,,,,,96,Blood: CyTOF %par,C1D15
4,BK00072EB02,0.000477,,0.000180,,0.000090,,,,0.099729,0.153511,,,,,,,94,Blood: CyTOF %par,C1D15
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
504,PICI0002_A11_K01942KE01_SPB_A02,0.000700,,,,,,,,0.152146,0.198609,,,,,,,47,Blood: CyTOF %par,C4D1
505,PICI0002_A16_K06191KE01_SPB_A02,,,,,,0.000051,,,,,0.000147,0.066540,0.942225,0.009655,0.299116,0.564482,65,Blood: CyTOF %leuk,C1D1
506,PICI0002_A16_K06191KE01_SPB_A02,,,,,,0.000051,,,,,0.000147,0.066540,0.942225,0.009655,0.299116,0.564482,65,Blood: CyTOF %par,C1D1
507,PICI0002_A16_K06268KE01_SPB_A01,0.010286,0.020036,,0.000067,,0.000019,,0.104581,0.225295,0.248609,0.001034,0.098679,0.928504,0.008597,0.177628,0.697401,9,Blood: CyTOF %par,C1D1


## Filter to baseline (C1D1) and nivo/chemo arm (A1)
Mapping used in this trial dataset:
- **A1**: Gemcitabine/nab-paclitaxel + **nivolumab** (nivo/chemo)
- **B2**: Gem/nab-paclitaxel + **APX005M** (CD40 agonist; sotiga/chemo)
- **C2**: Gem/nab-paclitaxel + **nivolumab + APX005M** (triplet)


In [8]:
BASELINE_TP = 'C1D1'
NIVO_ARM = 'A1'

cytof_base = cytof_full[cytof_full['timepoint.id'] == BASELINE_TP].copy()
clinical_nivo = clinical[clinical['Arm'] == NIVO_ARM].copy()

print('cytof_base', cytof_base.shape)
print('clinical_nivo', clinical_nivo.shape)
display(cytof_base)
display(clinical_nivo)

cytof_base (164, 20)
clinical_nivo (34, 29)


Unnamed: 0,sample.id,CCR7+ CD11b+ CD27- B cells,CD14+ HLA-DRlo m-MDSC (% of leukocytes),CD141+ Cross Presenting DC,CD1C+ CD141+ DC (% of leukocytes),CD1c-CD141+ Cross Presenting DC,CD40+ pDC (% of pDC),Conventional DC,HLA-DR+ CCR7+ B cells (% of leukocytes),HLA-DR+ Non-Naive CD4 T Cells,HLA-DR+ Non-Naive CD8 T Cells,HLA-DR+ Plasmablasts (% of Plasmablasts),HLA-DR+ T cells (% of leukocytes),Ki-67+ T cells (% of CD3+ cells),NKT cells (% of leukocytes),Tbet+ T cells (% of CD3+ cells),Tbet+ TCRgd+ T cells (% of TCRgd T cells),Deidentified.ID,ms,timepoint.id
8,BK00092EB01,0.000298,0.089852,,0.000014,,0.000003,,0.059357,0.054455,0.052884,0.000225,0.028703,0.874194,0.044607,0.203980,0.847944,67,Blood: CyTOF %par,C1D1
9,BK00092EB01,0.000298,0.089852,,0.000014,,0.000003,,0.059357,0.054455,0.052884,0.000225,0.028703,0.874194,0.044607,0.203980,0.847944,67,Blood: CyTOF %leuk,C1D1
16,BK00125EB01,0.002677,0.066600,,0.000031,,0.000003,,0.067780,0.053034,0.069605,0.000242,0.029428,0.901677,0.007949,0.113270,0.469105,124,Blood: CyTOF %leuk,C1D1
17,BK00125EB01,0.002677,0.066600,,0.000031,,0.000003,,0.067780,0.053034,0.069605,0.000242,0.029428,0.901677,0.007949,0.113270,0.469105,124,Blood: CyTOF %par,C1D1
18,BK00147EB01,0.001098,0.013744,,0.000045,,0.000010,,0.066508,0.127584,0.138874,0.000301,0.087462,0.894343,0.026749,0.393437,0.941372,122,Blood: CyTOF %par,C1D1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
469,PICI0002_A01_K06378KE01_SPB_A01,0.001382,0.134093,,0.000062,,0.000000,,0.028603,0.035831,0.063643,0.000257,0.019728,0.993612,0.000924,0.037423,0.749263,16,Blood: CyTOF %leuk,C1D1
505,PICI0002_A16_K06191KE01_SPB_A02,,,,,,0.000051,,,,,0.000147,0.066540,0.942225,0.009655,0.299116,0.564482,65,Blood: CyTOF %leuk,C1D1
506,PICI0002_A16_K06191KE01_SPB_A02,,,,,,0.000051,,,,,0.000147,0.066540,0.942225,0.009655,0.299116,0.564482,65,Blood: CyTOF %par,C1D1
507,PICI0002_A16_K06268KE01_SPB_A01,0.010286,0.020036,,0.000067,,0.000019,,0.104581,0.225295,0.248609,0.001034,0.098679,0.928504,0.008597,0.177628,0.697401,9,Blood: CyTOF %par,C1D1


Unnamed: 0,Deidentified.ID,Age,Sex,Race,Ethnicity,Arm,Arm Description,Actual Arm,Phase,Participant Dosed,...,Prior Radiation,Prior Chemo,ECOG at Screening,Tobacco History,clinical.observation.os,clinical.observation.os.event,clinical.observation.pfs,clinical.observation.pfs.event,clinical.observation.pfs.reason,Best Overall Response
6,9,53,F,White,Not Hispanic or Latino,A1,PHASE II A1: GEM/NP/NIVOLUMAB,A1,PHASE II,Y,...,N,Y,0,Former,620,True,157,True,:clinical-observation.event-reason/progressed,PARTIAL RESPONSE
7,10,59,F,Asian,Not Hispanic or Latino,A1,PHASE II A1: GEM/NP/NIVOLUMAB,A1,PHASE II,Y,...,N,N,1,Never,253,True,161,True,:clinical-observation.event-reason/progressed,PARTIAL RESPONSE
8,11,65,F,Asian,Not Hispanic or Latino,A1,PHASE II A1: GEM/NP/NIVOLUMAB,A1,PHASE II,Y,...,N,N,0,Former,673,False,575,False,:clinical-observation.event-reason/censored-st...,PARTIAL RESPONSE
9,12,65,M,White,Not Hispanic or Latino,A1,PHASE II A1: GEM/NP/NIVOLUMAB,A1,PHASE II,Y,...,N,Y,1,Never,530,True,98,True,:clinical-observation.event-reason/progressed,STABLE DISEASE
13,16,60,M,White,Not Hispanic or Latino,A1,PHASE II A1: GEM/NP/NIVOLUMAB,A1,PHASE II,Y,...,N,N,1,Current,297,True,296,True,:clinical-observation.event-reason/dead,
15,18,69,M,White,Not Hispanic or Latino,A1,PHASE II A1: GEM/NP/NIVOLUMAB,A1,PHASE II,Y,...,N,N,1,Never,249,True,147,True,:clinical-observation.event-reason/progressed,STABLE DISEASE
16,19,75,M,White,Not Hispanic or Latino,A1,PHASE II A1: GEM/NP/NIVOLUMAB,A1,PHASE II,Y,...,N,Y,1,Former,194,True,193,True,:clinical-observation.event-reason/dead,NOT EVALUABLE
18,21,64,F,White,Not Hispanic or Latino,A1,PHASE II A1: GEM/NP/NIVOLUMAB,A1,PHASE II,Y,...,N,Y,0,Former,467,True,272,True,:clinical-observation.event-reason/progressed,PARTIAL RESPONSE
24,29,75,M,White,Not Hispanic or Latino,A1,PHASE II A1: GEM/NP/NIVOLUMAB,A1,PHASE II,Y,...,N,Y,0,Never,779,False,99,True,:clinical-observation.event-reason/progressed,PROGRESSIVE DISEASE
26,31,54,M,White,Not Hispanic or Latino,A1,PHASE II A1: GEM/NP/NIVOLUMAB,A1,PHASE II,Y,...,N,N,0,Never,726,True,725,True,:clinical-observation.event-reason/dead,PARTIAL RESPONSE


## Merge baseline CyTOF with survival
Merge on `Deidentified.ID`. Survival columns are:
- `clinical.observation.os` (time)
- `clinical.observation.os.event` (1=death, 0=censored)


In [9]:
data = cytof_base.merge(
    clinical_nivo[['Deidentified.ID','clinical.observation.os','clinical.observation.os.event']],
    on='Deidentified.ID',
    how='inner'
)

# Rename for clarity
data = data.rename(columns={
    'clinical.observation.os': 'OS_time',
    'clinical.observation.os.event': 'OS_event'
})

print('analysis data', data.shape)
display(data[['Deidentified.ID','timepoint.id','OS_time','OS_event']])
print('OS_event counts:', data['OS_event'].value_counts(dropna=False).to_dict())

analysis data (50, 22)


Unnamed: 0,Deidentified.ID,timepoint.id,OS_time,OS_event
0,128,C1D1,213,False
1,128,C1D1,213,False
2,121,C1D1,508,True
3,121,C1D1,508,True
4,31,C1D1,726,True
5,31,C1D1,726,True
6,29,C1D1,779,False
7,29,C1D1,779,False
8,92,C1D1,445,True
9,92,C1D1,445,True


OS_event counts: {True: 28, False: 22}


## Select circulating T-cell features to test
This CyTOF subset file contains a limited set of preselected populations. For Question 1, I will test the T-cell-related features present here.

Edit `tcell_features` to add/remove populations from this file.

In [10]:
all_features = [c for c in data.columns if c not in ['sample.id','Deidentified.ID','ms','timepoint.id','OS_time','OS_event']]
print(all_features)

# Heuristic: keep features with 'T cell' in the name
candidate = [c for c in all_features if 'T cell' in c or 'T cells' in c]
print('Candidate T-cell features:')
for c in candidate:
    print(' -', c)

# Default set: expand/adjust as desired
tcell_features = list(candidate)
print('\nTesting features:', tcell_features)

['CCR7+ CD11b+ CD27- B cells', 'CD14+ HLA-DRlo m-MDSC (% of leukocytes)', 'CD141+ Cross Presenting DC', 'CD1C+ CD141+ DC (% of leukocytes)', 'CD1c-CD141+ Cross Presenting DC', 'CD40+ pDC (% of pDC)', 'Conventional DC', 'HLA-DR+ CCR7+ B cells (% of leukocytes)', 'HLA-DR+ Non-Naive CD4 T Cells', 'HLA-DR+ Non-Naive CD8 T Cells', 'HLA-DR+ Plasmablasts (% of Plasmablasts)', 'HLA-DR+ T cells (% of leukocytes)', 'Ki-67+ T cells (% of CD3+ cells)', 'NKT cells (% of leukocytes)', 'Tbet+ T cells (% of CD3+ cells)', 'Tbet+ TCRgd+ T cells (% of TCRgd T cells)']
Candidate T-cell features:
 - HLA-DR+ T cells (% of leukocytes)
 - Ki-67+ T cells (% of CD3+ cells)
 - NKT cells (% of leukocytes)
 - Tbet+ T cells (% of CD3+ cells)
 - Tbet+ TCRgd+ T cells (% of TCRgd T cells)

Testing features: ['HLA-DR+ T cells (% of leukocytes)', 'Ki-67+ T cells (% of CD3+ cells)', 'NKT cells (% of leukocytes)', 'Tbet+ T cells (% of CD3+ cells)', 'Tbet+ TCRgd+ T cells (% of TCRgd T cells)']


## Kaplan–Meier + log-rank (median split)
For each feature, split patients into High (>= median) vs Low (< median) and run a log-rank test.

If I split patients into “high” vs “low” based on this baseline immune feature, do they survive differently over time?

In [11]:
from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test

kmf = KaplanMeierFitter()
results = []

for feature in tcell_features:
    # Drop missing feature values for this feature
    df = data.loc[~data[feature].isna(), ['OS_time','OS_event', feature]].copy()
    if df.shape[0] < 10:
        continue

    med = df[feature].median()
    high_mask = df[feature] >= med

    high = df.loc[high_mask]
    low = df.loc[~high_mask]

    lr = logrank_test(
        high['OS_time'], low['OS_time'],
        event_observed_A=high['OS_event'],
        event_observed_B=low['OS_event']
    )

    results.append({
        'feature': feature,
        'median': float(med),
        'logrank_p': float(lr.p_value),
        'n_high': int(high.shape[0]),
        'n_low': int(low.shape[0])
    })

results_df = pd.DataFrame(results).sort_values('logrank_p').reset_index(drop=True)
results_df.to_csv('results_question1_logrank.csv', index=False)

print('Log-rank results (sorted by p-value):')
display(results_df)


Log-rank results (sorted by p-value):


Unnamed: 0,feature,median,logrank_p,n_high,n_low
0,NKT cells (% of leukocytes),0.016406,4.6e-05,26,24
1,Ki-67+ T cells (% of CD3+ cells),0.966289,0.000186,26,24
2,Tbet+ T cells (% of CD3+ cells),0.179298,0.000211,26,24
3,Tbet+ TCRgd+ T cells (% of TCRgd T cells),0.792079,0.001149,26,24
4,HLA-DR+ T cells (% of leukocytes),0.066838,0.186362,26,24


In [14]:
alpha = 0.05
sig = results_df[results_df['logrank_p'] < alpha].copy()
print(f'Significant features (p < {alpha}):')
display(sig)

for _, row in sig.iterrows():
    feature = row['feature']
    pval = row['logrank_p']
    df = data.loc[~data[feature].isna(), ['OS_time','OS_event', feature]].copy()
    med = df[feature].median()
    high_mask = df[feature] >= med

    plt.figure()
    kmf.fit(
        df.loc[high_mask,'OS_time'],
        event_observed=df.loc[high_mask,'OS_event'],
        label='High (>= median)'
    )
    ax = kmf.plot()

    kmf.fit(
        df.loc[~high_mask,'OS_time'],
        event_observed=df.loc[~high_mask,'OS_event'],
        label='Low (< median)'
    )
    kmf.plot(ax=ax)

    plt.title(
        f"{feature} (baseline {BASELINE_TP}, arm {NIVO_ARM})\n"
        f"log-rank p={pval:.3g}"
    )
    plt.xlabel('Overall survival time')
    plt.ylabel('Survival probability')

    out = 'fig_km_' + ''.join(ch if ch.isalnum() else '_' for ch in feature) + '.png'
    plt.tight_layout()
    plt.savefig(out, dpi=200)
    plt.close()

print('Saved KM plots for significant features.')


Significant features (p < 0.05):


Unnamed: 0,feature,median,logrank_p,n_high,n_low
0,NKT cells (% of leukocytes),0.016406,4.6e-05,26,24
1,Ki-67+ T cells (% of CD3+ cells),0.966289,0.000186,26,24
2,Tbet+ T cells (% of CD3+ cells),0.179298,0.000211,26,24
3,Tbet+ TCRgd+ T cells (% of TCRgd T cells),0.792079,0.001149,26,24


Saved KM plots for significant features.
