Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability to predict the outcomes to causal tree/forest #590

Closed
winston-zillow opened this issue Dec 19, 2022 · 3 comments
Closed

Add capability to predict the outcomes to causal tree/forest #590

winston-zillow opened this issue Dec 19, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@winston-zillow
Copy link

While we use CausalML to predict the effects, one often wants to know the outcome values of the control and/or treatment given the covariates at the same time. Even though one could build separate prediction tree/forest for this purpose, not only that approach is more inconvenient and expensive, but it is hard to ensure the prediction model agrees with the causal model. (It seems that the nodes of CausalTree/CausalRandomForest already contain the necessary values, e.g. ct_y_sum and ct_count etc. It currently lack ways to aggregate them at the API level.)

@winston-zillow winston-zillow added the enhancement New feature or request label Dec 19, 2022
@jeongyoonlee
Copy link
Collaborator

Hi @winston-zillow, once you train a causalml model, you can predict for both the control and treatment units with the same covariates. Is is different from what you describe here?

If so, can you elaborate more on what you'd like to achieve? I'd appreciate it if you could provide a pseudo code with the APIs you have in mind.

@winston-zillow
Copy link
Author

winston-zillow commented Jan 25, 2023

@jeongyoonlee I meant the predict method output the effects, i.e. the delta of the outcome between control and treatment, correct?

tree1_ite_pred = tree1.predict(df_test[feature_names].values)
tree2_ite_pred = tree2.predict(df_test[feature_names].values)

df_result = pd.DataFrame(
    {
        'tree_mse_ite': tree1_ite_pred,
        'tree_causal_mse_ite': tree2_ite_pred,
        'outcome': df_test['outcome'], # <== at inference, we also want to estimate this
        'is_treated': df_test['treatment'],
        'treatment_effect': df_test['treatment_effect']
    }
)

But during inference, given a unit with covariates, we also want the estimated outcome using the same trained model.

The GRF in EconML has the predict_full() method that also estimate the counterfactual outcomes along with the effects, as shown in the attached screenshot for the model built as following:

# Code for EconML predict_full()
from econml.grf import CausalForest
est = CausalForest(criterion='het', n_estimators=400, min_samples_leaf=5, max_depth=None,
                   min_var_fraction_leaf=None, min_var_leaf_on_val=True,
                   min_impurity_decrease = 0.0, max_samples=0.45, min_balancedness_tol=.45,
                   warm_start=False, inference=True, fit_intercept=True, subforest_size=4,
                   honest=True, verbose=0, n_jobs=-1, random_state=1235)
est.fit(X, T, y)
effect_and_Y0 = est.predict_full(X_test, alpha=0.01)

Is this clear? Is there a way to do the same in CausalML already?

Screen Shot 2023-01-25 at 10 29 32 AM

@jeongyoonlee
Copy link
Collaborator

I'm closing this issue as it has been addressed in #623.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants