Causal Forest DML — detect diet effect (daily proportion units)
===============================================================
**Outcome :** blood urate (mg/dL) 

**Treatment :** diet_ratio_per_day = Δ target diet proportion of total daily intake (0‑1)

**Covariates:** confounders (age, BMI, microbiome, SNPs, blood tests) 

We estimate CATE in mg/dL per + diet proportion.
Clinical target: raise cereal proportion with m and lower urate
≥ n mg/dL.  Responders are subjects whose upper 90 % CI of the slope is still
< m / n mg/dL per proportion.

The Casual Forest model was trained with HPP data and downloaded from their server. The training process is shown at the first part but the data cannot be provided due to data privacy. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import KBinsDiscretizer
from sklearn.ensemble import RandomForestRegressor
from econml.dml import CausalForestDML
import joblib

# load train data
df_train = pd.read_csv("diet_urate_dataset_train.csv")

# target diet
target_diet = "Vegetables"
X_train = df_train.drop(columns=[target_diet, "bt__urate_float_value"])
y_train = df_train["bt__urate_float_value"]
T_train = df_train[target_diet]


# fitting causal forest
model_y = RandomForestRegressor(n_estimators=600, max_depth=10,
                             min_samples_leaf=10, max_features = 0.5, n_jobs=-1, random_state=9527)
model_t = RandomForestRegressor(n_estimators=600, max_depth=10,
                             min_samples_leaf=10, max_features = 0.5, n_jobs=-1, random_state=9527)

cf = CausalForestDML(model_y=model_y, model_t=model_t,
                     discrete_treatment=False,
                     n_estimators=900,
                     min_samples_leaf=10,
                     max_depth=10,
                     max_features = 0.5,
                     cv=5,
                     random_state=9527)
cf.fit(y_train, T_train, X=X_train)

# save model
joblib.dump(cf, "CausalForest_model.joblib")

In [None]:
# load test data (subjects we want to classify) 
df_test = pd.read_csv("diet_urate_dataset_test.csv")

# target diet
target_diet = "Vegetables"
X_test = df.drop(columns=[target_diet, "bt__urate_float_value"])

# load Causal Forest model
cf = joblib.load("CausalForest_model.joblib")

# compute CATE & CI 
cate_hat = cf.effect(X_test)                    # mg/dL per +1 proportion
ci_low, ci_up = cf.effect_interval(X_test, alpha=0.10) # 90% CI

ate_hat = cf.ate(X_test)
lb_ate, ub_ate = cf.ate_interval(X_test, alpha=0.10) 

# conduct interventioon and classify responders
clinic_target = -0.3    # mg/dL target
intervention = 0.05  # proportion increase
threshold_prop = clinic_target / intervention   
responders = (ci_up < threshold_prop).ravel()

print(f"ATE : {ate_hat:.3f} mg/dL per proportion  "
      f"(90% CI {lb_ate:.3f}, {ub_ate:.3f})")
print(f"Responders (<{CLINICAL_DROP} mg/dL at +5 pp): "
      f"{responders.sum()} / {len(responders)}")