# `causalml` - Meta-Learner Example Notebook

# Introduction
CausalML is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research. It provides a standard interface that allows user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data. Essentially, it estimates the causal impact of intervention T on outcome Y for users with observed features X, without strong assumptions on the model form.  The package currently supports the following methods:
- Tree-based algorithms
    - Uplift tree/random forests on KL divergence, Euclidean Distance, and Chi-Square
    - Uplift tree/random forests on Contextual Treatment Selection
- Meta-learner algorithms
    - S-learner
    - T-learner
    - X-learner
    - R-learner
    
In this notebook, we will generate some synthetic data to demonstrate how to use the various Meta-Learner algorithms in order to estimate Individual Treatment Effects (and Average Treatment Effects with confidence intervals).

In [13]:
%load_ext autoreload
%autoreload 2

In [26]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
from xgboost import XGBRegressor
import warnings

from causalml.inference.meta import LRSLearner
from causalml.inference.meta import XGBTLearner, MLPTLearner
from causalml.inference.meta import BaseXLearner, BaseRLearner, BaseSLearner, BaseTLearner
from causalml.match import NearestNeighborMatch, create_table_one
from causalml.propensity import ElasticNetPropensityModel
from causalml.dataset import *
from causalml.features import OUTCOME_COL, TREATMENT_COL, SCORE_COL, INFERENCE_FEATURES
from causalml.features import MATCHING_COVARIATES, PROPENSITY_FEATURES, PROPENSITY_FEATURE_TRANSFORMATIONS
from causalml.features import INFERENCE_FEATURE_TRANSFORMATIONS, load_data, INFERENCE_FEATURES
from causalml.metrics import *

warnings.filterwarnings('ignore')
plt.style.use('fivethirtyeight')

%matplotlib inline

### 1. Generate synthetic data
- We have implemented 4 modes of generating synthetic data (specified by input parameter `mode`). Refer to the References section for more detail on these data generation processes.

In [69]:
# Generate synthetic data using mode 1
y, X, treatment, tau, b, e = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)

### 3. Calculate Propensity Scores
Although we have propensity scores (`e`) generated from the `synthetic_data` function, typically in reality we aren't able to directly observe these values (unless a separate model has been independently developed for the treatment flag, in which case we could use that). We have developed a light-weight propensity model that allows you to specify the features to estimate treatment-propensity from.

*Note that propensity scores are only used for X Learner and R Learner.*

In [30]:
# Predict p_hat because e would not be directly observed in real-life
p_model = ElasticNetPropensityModel()
p_hat = p_model.fit_predict(X, treatment)

### 4. Calculate Average Treatment Effect (ATE)
A meta-learner can be instantiated by calling a base learner class and providing an sklearn/xgboost regressor class as input. Alternatively, we have provided some ready-to-use learners that have already inherited their respective base learner class capabilities. This is more abstracted and allows these tools to be quickly and readily usable.

In [37]:
# Ready-to-use S-Learner using LinearRegression
learner_s = LRSLearner()
ate_s = learner_s.estimate_ate(X, treatment, y)
print(ate_s)
print('ATE estimate: {:.03f}'.format(ate_s[0]))
print('ATE lower bound: {:.03f}'.format(ate_s[1]))
print('ATE upper bound: {:.03f}'.format(ate_s[2]))

(0.5884572436573214, 0.4385237264356304, 0.7383907608790125)
ATE estimate: 0.588
ATE lower bound: 0.439
ATE upper bound: 0.738


In [55]:
learner_t.estimate_ate(X, treatment, y)

ValueError: feature_names mismatch: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5'] ['f0', 'f1', 'f2', 'f3', 'f4']
expected f5 in input data

In [38]:
# Ready-to-use T-Learner using XGB
learner_t = XGBTLearner()
ate_t = learner_t.estimate_ate(X, treatment, y)
print(ate_s)

# Calling the Base Learner class and feeding in a specified model
learner_t = BaseTLearner(XGBRegressor())
ate_t = learner_t.estimate_ate(X, treatment, y)
print(ate_t)

ValueError: feature_names mismatch: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5'] ['f0', 'f1', 'f2', 'f3', 'f4']
expected f5 in input data

In [None]:
    for base_learner,label_l in zip([BaseSLearner, BaseTLearner, BaseXLearner, BaseRLearner],['S', 'T', 'X', 'R']):
        for model,label_m in zip([LinearRegression, XGBRegressor],['LR', 'XGB']):
            learner = base_learner(model())
            try:
                preds_dict['{} Learner ({})'.format(label_l, label_m)] = learner.fit_predict(X=X, p=p_hat, treatment=w, y=y)
            except TypeError:
                preds_dict['{} Learner ({})'.format(label_l, label_m)] = learner.fit_predict(X=X, treatment=w, y=y)

# Validating the Meta-Learners' Accuracy

### 1. Train Test Split

In [None]:
valid_size=0.2
X_tr, X_val, y_tr, y_val, treatment_tr, treatment_val, tau_tr, tau_val, b_tr, b_val, e_tr, e_val = \
    train_test_split(X, y, treatment, tau, b, e, test_size=valid_size, random_state=123, shuffle=True)

# Summary
-  Split the synthetic data set 20% for validation and 80% for training, X Learner (XGB) and R Learner (XGB) are still the best performing meta-learners (using MSE, Absolute % Error from ATE, and KL Divergence as measures of training and validaiton performance)
-  For synthetic data Method 2 (randomized trial) and Method 4 (unrelated treatment and control groups), when the sample size is small (8k for training and 2k for validation) and only 10 simulaitions, training performance of all three metrics better than validation; as increasing the sample size (40k for training and 10k for validation) and also with more 100 simulations, results are becoming very close for training and validation
-  For synthetic data Method 1 and Method 3, training and validation already have similar results even with smaller 
sample size.
-  Looking at AUUC values of cumulative gains of model estimates, training and validaiton results are still consistent

# Methodology
Using the methods outlined ['Quasi-Oracle Estimation of Heterogeneous Treatment Effects' (Nie X. and Wager S., 2018)](https://arxiv.org/pdf/1712.04912.pdf), we compare the S/T/X/R Meta Learners using a linear model (sklearn) and a boosted tree model (xgboost) to understand which learners and models perform well in different scenarios. 

- Method 1: synthetic data with a diffult nuisance components and an easy treatment effect
- Method 2: synthetic data of a randomized trial
- Method 3: synthetic data with easy propensity and a difficult baseline
- Method 4: synthetic data with unrelated treatment and control groups

For each method, we run `k` simulations, where each simulation generates `n` samples and split 20% as the hold-out data set for validation.

# Method 1

In [None]:
#get hte synthetic data
preds_dict_train, preds_dict_valid = get_synthetic_preds_holdout(simulate_nuisance_and_easy_treatment, n=10000, valid_size=0.2)

In [None]:
#check if split the data correctly
print(preds_dict_train['generated_data']['y'].shape, 
      preds_dict_valid['generated_data']['y'].shape
     )

In [None]:
train_summary_1,validation_summary_1  = get_synthetic_summary_holdout(simulate_nuisance_and_easy_treatment, n=10000, valid_size=0.2, k=10)

In [None]:
train_summary_1

In [None]:
validation_summary_1

In [None]:
scatter_plot_summary_holdout(train_summary_1, validation_summary_1, k=10, label=['Train', 'Validaiton'], drop_learners=[], drop_cols=[])

In [None]:
%time train_summary_1_50k_10, validation_summary_1_50k_10  = get_synthetic_summary_holdout(simulate_nuisance_and_easy_treatment, n=50000, valid_size=0.2, k=10)

In [None]:
#with 50k samples and 10 simulations
scatter_plot_summary_holdout(train_summary_1_50k_10, validation_summary_1_50k_10, k=10, label=['Train', 'Validaiton'], drop_learners=[], drop_cols=[])

In [None]:
%time train_summary_1_50k, validation_summary_1_50k  = get_synthetic_summary_holdout(simulate_nuisance_and_easy_treatment, n=50000, valid_size=0.2, k=100)

In [None]:
#with 50k samples and 100 simulations
scatter_plot_summary_holdout(train_summary_1_50k, validation_summary_1_50k, k=100, label=['Train', 'Validaiton'], drop_learners=[], drop_cols=[])

In [None]:
bar_plot_summary_holdout(train_summary_1, validation_summary_1, k=10, drop_learners=['S Learner (LR)'], drop_cols=[])

In [None]:
# Single simulation (50k samples)
synthetic_preds_holdout_train_1, synthetic_preds_holdout_valid_1 = get_synthetic_preds_holdout(simulate_nuisance_and_easy_treatment, n=50000, valid_size=0.2)

In [None]:
#distribution plot for signle simulation of Training
distr_plot_single_sim(synthetic_preds_holdout_train_1, kind='kde', linewidth=2, bw_method=0.5,
           drop_learners=['S Learner (LR)',' S Learner (XGB)'])

In [None]:
#distribution plot for signle simulation of Validaiton
distr_plot_single_sim(synthetic_preds_holdout_valid_1, kind='kde', linewidth=2, bw_method=0.5,
           drop_learners=['S Learner (LR)', 'S Learner (XGB)'])

In [None]:
# Scatter Plots for a Single Simulation of Training Data
scatter_plot_single_sim(synthetic_preds_holdout_train_1)

In [None]:
# Scatter Plots for a Single Simulation of Validaiton Data
scatter_plot_single_sim(synthetic_preds_holdout_valid_1)

In [None]:
# Cumulitive Gain AUUC values for a Single Simulation of Training Data
get_synthetic_auuc(synthetic_preds_holdout_train_1, drop_learners=['S Learner (LR)'])

In [None]:
# Cumulitive Gain AUUC values for a Single Simulation of Validaiton Data
get_synthetic_auuc(synthetic_preds_holdout_valid_1, drop_learners=['S Learner (LR)'])

## Method 2

In [None]:
%time train_summary_2, validation_summary_2  = get_synthetic_summary_holdout(simulate_randomized_trial, n=10000, valid_size=0.2, k=10)

In [None]:
train_summary_2

In [None]:
validation_summary_2

In [None]:
scatter_plot_summary_holdout(train_summary_2, validation_summary_2, k=10, label=['Train','Validaiton'], drop_learners=['S Learner (LR)'], drop_cols=[])

In [None]:
#with 50k samples and 10 simulations
%time train_summary_2_50k_10, validation_summary_2_50k_10  = get_synthetic_summary_holdout(simulate_randomized_trial, n=50000, valid_size=0.2, k=10)

In [None]:
scatter_plot_summary_holdout(train_summary_2_50k_10, validation_summary_2_50k_10, k=10, label=['Train', 'Validaiton'], drop_learners=['S Learner (LR)'], drop_cols=[])

In [None]:
#with 50k samples and 100 simulations
%time train_summary_2_50k_100, validation_summary_2_50k_100  = get_synthetic_summary_holdout(simulate_randomized_trial, n=50000, valid_size=0.2, k=100)

In [None]:
scatter_plot_summary_holdout(train_summary_2_50k_100, validation_summary_2_50k_100, k=100, label=['Train', 'Validaiton'], drop_learners=['S Learner (LR)'], drop_cols=[])

In [None]:
bar_plot_summary_holdout(train_summary_2, validation_summary_2, k=10, drop_learners=['S Learner (LR)','S Learner (XGB)'], drop_cols=[])

In [None]:
# Single simulation (50k samples)
synthetic_preds_holdout_train_2, synthetic_preds_holdout_valid_2 = get_synthetic_preds_holdout(simulate_randomized_trial, n=50000, valid_size=0.2)

In [None]:
#distribution plot for signle simulation of Training
distr_plot_single_sim(synthetic_preds_holdout_train_2, kind='kde', linewidth=2, bw_method=0.5, 
           drop_learners=['S Learner (LR)', 'S Learner (XGB)'])

In [None]:
#distribution plot for signle simulation of Validaiton
distr_plot_single_sim(synthetic_preds_holdout_valid_2, kind='kde', linewidth=2, bw_method=0.5, 
           drop_learners=['S Learner (LR)', 'S Learner (XGB)'])

In [None]:
# Scatter Plots for a Single Simulation for Training Data
scatter_plot_single_sim(synthetic_preds_holdout_train_2)

In [None]:
# Scatter Plots for a Single Simulation for Validaiton Data
scatter_plot_single_sim(synthetic_preds_holdout_valid_2)

In [None]:
# Cumulitive Gain AUuC values for a Single Simulation of Training Data
get_synthetic_auuc(synthetic_preds_holdout_train_2, drop_learners=['S Learner (LR)'])

In [None]:
# Cumulitive Gain AUUC values for a Single Simulation of Validaiton Data
get_synthetic_auuc(synthetic_preds_holdout_valid_2, drop_learners=['S Learner (LR)'])

## Method 3

In [None]:
%time
train_summary_3, validation_summary_3  = get_synthetic_summary_holdout(simulate_easy_propensity_difficult_baseline, n=10000, valid_size=0.2, k=10)

In [None]:
train_summary_3

In [None]:
validation_summary_3

In [None]:
scatter_plot_summary_holdout(train_summary_3, validation_summary_3, k=10, label=['Train', 'Validaiton'], drop_learners=['X Learner (LR)', 'T Learner (LR)'], drop_cols=[])

In [None]:
#with 50k samples and 10 simulations
%time train_summary_3_50k_10, validation_summary_3_50k_10  = get_synthetic_summary_holdout(simulate_easy_propensity_difficult_baseline, n=50000, valid_size=0.2, k=10)

In [None]:
scatter_plot_summary_holdout(train_summary_3_50k_10, validation_summary_3_50k_10, k=10, label=['Train', 'Validaiton'], drop_learners=['X Learner (LR)', 'T Learner (LR)'], drop_cols=[])

In [None]:
#with 50k samples and 100 simulations
%time train_summary_3_50k_100, validation_summary_3_50k_100  = get_synthetic_summary_holdout(simulate_easy_propensity_difficult_baseline, n=50000, valid_size=0.2, k=100)

In [None]:
scatter_plot_summary_holdout(train_summary_3_50k_100, validation_summary_3_50k_100, k=10, label=['Train', 'Validaiton'], drop_learners=['X Learner (LR)', 'T Learner (LR)'], drop_cols=[])

In [None]:
bar_plot_summary_holdout(train_summary_3, validation_summary_3, k=10, drop_learners=[], drop_cols=[])

In [None]:
# Single simulation (50k samples)
synthetic_preds_holdout_train_3, synthetic_preds_holdout_valid_3 = get_synthetic_preds_holdout(simulate_easy_propensity_difficult_baseline, n=50000, valid_size=0.2)

`distr_plot()` and `scatter_plot_predictions()` not applicable for Method 3, since Actuals are uniform values.

## Method 4

In [None]:
%time train_summary_4, validation_summary_4 = get_synthetic_summary_holdout(simulate_unrelated_treatment_control, n=10000, valid_size=0.2, k=10)

In [None]:
train_summary_4

In [None]:
validation_summary_4

In [None]:
%time train_summary_4_50k, validation_summary_4_50k  = get_synthetic_summary_holdout(simulate_unrelated_treatment_control, n=50000, valid_size=0.2, k=10)

In [None]:
%time train_summary_4_50k_100, validation_summary_4_50k_100  = get_synthetic_summary_holdout(simulate_unrelated_treatment_control, n=50000, valid_size=0.2, k=100)

In [None]:
#with 10k samples and 10 simulations
scatter_plot_summary_holdout(train_summary_4, validation_summary_4, k=10, label=['Train', 'Validaiton'], drop_learners=['S Learner (LR)'], drop_cols=[])

In [None]:
#with 50k samples and 10 simulations
scatter_plot_summary_holdout(train_summary_4_50k, validation_summary_4_50k, k=10, label=['Train', 'Validaiton'], drop_learners=['S Learner (LR)'], drop_cols=[])

In [None]:
#with 50k samples and 100 simulations
scatter_plot_summary_holdout(train_summary_4_50k_100, validation_summary_4_50k_100, k=100, label=['Train', 'Validaiton'], drop_learners=['S Learner (LR)'], drop_cols=[])

In [None]:
bar_plot_summary_holdout(train_summary_4, validation_summary_4, k=10, drop_learners=['S Learner (LR)'], drop_cols=[])

In [None]:
# Single simulation (50k samples)
synthetic_preds_holdout_train_4, synthetic_preds_holdout_valid_4 = get_synthetic_preds_holdout(simulate_unrelated_treatment_control, n=50000, valid_size=0.2)

In [None]:
#distribution plot for signle simulation of Training
distr_plot_single_sim(synthetic_preds_holdout_train_4, kind='kde', linewidth=2, bw_method=0.5, 
           drop_learners=['S Learner (LR)'])

In [None]:
#distribution plot for signle simulation of Validaiton
distr_plot_single_sim(synthetic_preds_holdout_valid_4, kind='kde', linewidth=2, bw_method=0.5, 
           drop_learners=['S Learner (LR)'])

In [None]:
# Scatter Plots for a Single Simulation for Training
scatter_plot_single_sim(synthetic_preds_holdout_train_4)

In [None]:
# Scatter Plots for a Single Simulation for Validaiton
scatter_plot_single_sim(synthetic_preds_holdout_valid_4)

In [None]:
# Cumulitive Gain AUUC values for a Single Simulation of Training Data
get_synthetic_auuc(synthetic_preds_holdout_train_4, drop_learners=['S Learner (LR)'])

In [None]:
# Cumulitive Gain AUUC values for a Single Simulation of Validaiton Data
get_synthetic_auuc(synthetic_preds_holdout_valid_4, drop_learners=['S Learner (LR)'])

# TO REMOVE

In [24]:
def get_synthetic_preds_holdout(synthetic_data_func, n=1000, valid_size = 0.2):
    from sklearn.model_selection import train_test_split
    """Generate predictions for synthetic data using specified function (single simulation) for train and holdout

    Args:
        synthetic_data_func (function): synthetic data generation function
        n (int, optional): number of samples
        valid_size(float,optional): validaiton/hold out data size

    Returns:
        (tuple): synthetic training and validation data dictionaries:

          - preds_dict_train (dict): synthetic training data dictionary
          - preds_dict_valid (dict): synthetic validation data dictionary
    """
    y, X, w, tau, b, e = synthetic_data_func(n=n)

    X_train, X_val, y_train, y_val, w_train, w_val, tau_train, tau_val, b_train, b_val, e_train, e_val = \
        train_test_split(X, y, w, tau, b, e, test_size=valid_size, random_state=40, shuffle=True)

    preds_dict_train = {}
    preds_dict_valid = {}

    preds_dict_train[KEY_ACTUAL] = tau_train
    preds_dict_valid[KEY_ACTUAL] = tau_val

    preds_dict_train['generated_data'] = {
        'y': y_train,
        'X': X_train,
        'w': w_train,
        'tau': tau_train,
        'b': b_train,
        'e': e_train}
    preds_dict_valid['generated_data'] = {
        'y': y_val,
        'X': X_val,
        'w': w_val,
        'tau': tau_val,
        'b': b_val,
        'e': e_val}

    # Predict p_hat because e would not be directly observed in real-life
    p_model = ElasticNetPropensityModel()
    p_hat_train = p_model.fit_predict(X_train, w_train)
    p_hat_val = p_model.fit_predict(X_val, w_val)

    for base_learner, label_l in zip([BaseSLearner, BaseTLearner, BaseXLearner, BaseRLearner],['S', 'T', 'X', 'R']):
        for model, label_m in zip([LinearRegression, XGBRegressor],['LR', 'XGB']):
            ###RLearner will need to fit on the p_hat
            if label_l != 'R':
                learner = base_learner(model())
                #fit the model on training data only
                learner.fit(X=X_train, treatment=w_train, y=y_train)
                try:
                    preds_dict_train['{} Learner ({})'.format(
                        label_l, label_m)] = learner.predict(X=X_train, p=p_hat_train).flatten()
                    preds_dict_valid['{} Learner ({})'.format(
                        label_l, label_m)] = learner.predict(X=X_val, p=p_hat_val).flatten()
                except TypeError:
                    preds_dict_train['{} Learner ({})'.format(
                        label_l, label_m)] = learner.predict(X=X_train, treatment=w_train, y=y_train).flatten()
                    preds_dict_valid['{} Learner ({})'.format(
                        label_l, label_m)] = learner.predict(X=X_val, treatment=w_val, y=y_val).flatten()
            else:
                learner = base_learner(model())
                learner.fit(X=X_train, p=p_hat_train, treatment=w_train, y=y_train)
                preds_dict_train['{} Learner ({})'.format(
                    label_l, label_m)] = learner.predict(X=X_train).flatten()
                preds_dict_valid['{} Learner ({})'.format(
                    label_l, label_m)] = learner.predict(X=X_val).flatten()


    return preds_dict_train, preds_dict_valid