# TTE-v1: Target Trial Emulation in Python

This notebook demonstrates an approach to target trial emulation using Python. In this example, we:

- Load and preview dummy data
- Estimate switching weights and censoring weights using logistic regression models
- Combine these weights and fit an outcome model using weighted least squares (WLS)
- Expand the dataset for follow-up analysis
- Fit a marginal structural model (MSM) to predict the outcome over follow-up time
- Plot the predicted survival difference over follow-up

The notebook mirrors the R-based methodology while clearly explaining each step.

In [None]:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

# Configure matplotlib for inline display in the notebook
%matplotlib inline


## Step 1: Load the Dummy Data

We load the dummy data from a CSV file (assumed to be named `data_censored.csv`). This dataset contains information such as patient ID, time period, treatment, outcome, and other covariates.

In [None]:
# Load the dummy data into a pandas DataFrame
data = pd.read_csv("data_censored.csv")

# Preview the first few rows of the dataset
print("Data preview:")
print(data.head())


## Step 2: Fit Switching Weight Models

We estimate switching weights using two logistic regression models:

- **Numerator Model:** Predicts the treatment variable using `age`.
- **Denominator Model:** Predicts treatment using `age`, `x1`, and `x3`.

The switching weight for each observation is calculated as the ratio of the predicted probabilities from the numerator model to the denominator model.

In [None]:
# Fit switching weight models

# Numerator model: Predict treatment using only age
switch_model_numer = smf.logit("treatment ~ age", data=data).fit(disp=False)

# Denominator model: Predict treatment using age, x1, and x3
switch_model_denom = smf.logit("treatment ~ age + x1 + x3", data=data).fit(disp=False)

# Calculate predicted probabilities from both models
data["switch_prob_numer"] = switch_model_numer.predict(data)
data["switch_prob_denom"] = switch_model_denom.predict(data)

# Compute the switching weight as the ratio of the numerator and denominator probabilities
data["switch_weight"] = data["switch_prob_numer"] / data["switch_prob_denom"]

# Display a preview of the switching weights
print("Switching weights preview:")
print(data[["switch_weight"]].head())


## Step 3: Fit Censoring Weight Models

Next, we estimate censoring weights to account for potential informative censoring:

- **Numerator Model:** Uses `x2` to predict the censoring indicator (`censored`).
- **Denominator Model:** Uses both `x2` and `x1` to predict `censored`.

The censoring weight is computed as the ratio of predicted probabilities from the numerator model to the denominator model.

In [None]:
# Fit censoring weight models

# Numerator model: Predict the censoring indicator using x2
censor_model_numer = smf.logit("censored ~ x2", data=data).fit(disp=False)

# Denominator model: Predict the censoring indicator using x2 and x1
censor_model_denom = smf.logit("censored ~ x2 + x1", data=data).fit(disp=False)

# Calculate predicted probabilities for censoring
data["cens_prob_numer"] = censor_model_numer.predict(data)
data["cens_prob_denom"] = censor_model_denom.predict(data)

# Compute the censoring weight as the ratio of the numerator and denominator probabilities
data["censor_weight"] = data["cens_prob_numer"] / data["cens_prob_denom"]

# Display a preview of the censoring weights
print("Censoring weights preview:")
print(data[["censor_weight"]].head())


## Step 4: Combine Weights

The overall weight for each observation is obtained by multiplying the switching weight and the censoring weight.

In [None]:
# Combine switching and censoring weights
data["weight"] = data["switch_weight"] * data["censor_weight"]

# Display a preview of the combined weights
print("Combined weights preview:")
print(data[["weight"]].head())


## Step 5: Fit Outcome Model using Weighted Regression

We fit an outcome model using weighted least squares (WLS). This model estimates the effect of treatment on the outcome while adjusting for the covariate `x2`.

The weights computed earlier are used to adjust for treatment switching and censoring.

In [None]:
# Fit the outcome model using weighted least squares
outcome_model = smf.wls("outcome ~ treatment + x2", data=data, weights=data["weight"]).fit()

# Print the coefficients of the outcome model
print("\nSimplified Outcome Model Coefficients:")
print(outcome_model.params)


## Step 6: Expand Data for Follow-Up

To simulate patient follow-up over time, we expand the dataset by creating copies of the original data for follow-up times 0 through 10. This mimics the process of data expansion in a sequence of target trials.

In [None]:
# Define follow-up times from 0 to 10
followup_times = np.arange(0, 11)

# Expand the dataset by creating a new copy for each follow-up time
expanded = pd.concat([data.assign(followup_time=t) for t in followup_times], ignore_index=True)

# Preview the expanded dataset
print("Expanded data preview:")
print(expanded.head())


## Step 7: Fit a Marginal Structural Model (MSM) on Expanded Data

Using the expanded data, we fit a marginal structural model (MSM) via weighted least squares. This model relates the outcome to treatment, follow-up time, and the covariate `x2`.

The MSM helps us estimate the causal effect of treatment over time.

In [None]:
# Fit a marginal structural model (MSM) using the expanded dataset
msm_model = smf.wls("outcome ~ treatment + followup_time + x2", data=expanded, weights=expanded["weight"]).fit()

# Print the MSM model coefficients
print("\nSimplified MSM Model Coefficients:")
print(msm_model.params)


## Step 8: Predict Outcomes Over Follow-Up

We predict the outcome over follow-up times (0 to 10) using the MSM. For each follow-up time, we compute a weighted average of the predictions to obtain the estimated survival difference.

In [None]:
# Define prediction times (follow-up times from 0 to 10)
pred_times = np.arange(0, 11)

# Initialize a list to store the weighted average predictions
predictions = []

# Loop through each follow-up time and predict the outcome
for t in pred_times:
    # Create a temporary copy of the original data and assign the current follow-up time
    temp = data.copy()
    temp["followup_time"] = t
    
    # Predict outcomes using the MSM model
    pred = msm_model.predict(temp)
    
    # Calculate the weighted average prediction for this follow-up time
    predictions.append(np.average(pred, weights=temp["weight"]))

# Display the predicted outcomes over follow-up times
print("Predicted outcomes over follow-up times:")
print(predictions)


## Step 9: Plot the Predicted Survival Difference

Finally, we plot the predicted survival difference over follow-up time. The blue line represents the weighted average predictions, while the red dashed lines (dummy confidence intervals) are shown as ±0.1 around the predictions.

In [None]:
# Create dummy lower and upper bounds for the plot (±0.1 for demonstration purposes)
lower_bound = [p - 0.1 for p in predictions]
upper_bound = [p + 0.1 for p in predictions]

# Plot the predicted survival difference over follow-up time
plt.figure(figsize=(8,6))
plt.plot(pred_times, predictions, label="Survival Difference", color="blue")
plt.plot(pred_times, lower_bound, "r--", label="2.5% CI")
plt.plot(pred_times, upper_bound, "r--", label="97.5% CI")
plt.xlabel("Follow-up Time")
plt.ylabel("Survival Difference")
plt.title("Predicted Survival Difference Over Follow-up")
plt.legend()
plt.show()
