Causal Forest DML — Detect Feature Intervention Effect
===============================================================
**Outcome:** Blood urate (mg/dL) 

**Treatment:** Intervention on target feature

**Covariates:** Confounders (age, BMI, microbiome, SNPs, blood tests) 

We estimate CATE in mg/dL per unit of target feature change.
Clinical target: Change target feature by m units and lower urate by ≥ n mg/dL. Responders are subjects whose CATE is < m / n mg/dL per unit.

The Causal Forest model was trained with HPP data and downloaded from their server. The training process is shown in the first part, but the data cannot be provided due to data privacy. 

### Training

Set the target feature on which you want to intervene, and set the confounding variables as training data. Then train the Causal Forest (CF) with a Random Forest regressor.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import KBinsDiscretizer
from sklearn.ensemble import RandomForestRegressor
from econml.dml import CausalForestDML
import joblib

# load train data
df = pd.read_csv("data/urtate_data.csv")

# target diet
target = "Vegetables"
X_train = df.drop(columns=[target, "bt__urate_float_value"])
y_train = df["bt__urate_float_value"]
T_train = df[target]


# fitting causal forest
model_y = RandomForestRegressor(n_estimators=600, max_depth=10,
                             min_samples_leaf=10, max_features = 0.5, n_jobs=-1, random_state=9527)
model_t = RandomForestRegressor(n_estimators=600, max_depth=10,
                             min_samples_leaf=10, max_features = 0.5, n_jobs=-1, random_state=9527)

cf = CausalForestDML(model_y=model_y, model_t=model_t,
                     discrete_treatment=False,
                     n_estimators=900,
                     min_samples_leaf=10,
                     max_depth=10,
                     max_features = 0.5,
                     cv=5,
                     random_state=9527)
cf.fit(y_train, T_train, X=X_train)

# save model
joblib.dump(cf, "CausalForest_model.joblib")

### Application

Load the CF model and the dataset you want to classify responders for. Set the threshold of the clinical target and compute the CATE.

In [None]:
# load test data (subjects we want to classify) 
df_test = pd.read_csv("data/urtate_data_test.csv")

X_test = df_test.drop(columns=[target, "bt__urate_float_value"])

# load Causal Forest model
cf = joblib.load("models/CausalForest_model.joblib")

# compute CATE and ATE
cate_hat = cf.effect(X_test)                    # mg/dL per +1 unit
ate_hat = cf.ate(X_test)

# conduct interventioon and classify responders, set urate drop 0.3 mg/dL as target and intervention with increasing 1 unit.
clinic_target = -0.3   
intervention = 1 
threshold_prop = clinic_target / intervention   
responders = (cate_hat < threshold_prop).ravel()

print(f"ATE : {ate_hat:.3f} mg/dL per proportion ")
print(f"Responders (<{clinic_target} mg/dL at {intervention} on {target}): "
      f"{responders.sum()} / {len(responders)}")