# Preprocessing Only Workflow

In this workflow, CFRL takes in an offline trajectory and then preprocesses the offline trajectory 
using `SyntheticPreprocessor`. The final output of the workflow is the preprocessed (debiased) 
offline trajectory. This workflow is appropriate when the user does not want to train policies using 
CFRL. Instead, the user can take the preprocessed trajectory to train a counterfactually fair policy 
using another reinforcement learning library or application that better fits their needs.

We begin by importing the liberaries needed for this demonstration.

In [6]:
# Need this temporarily to import CFRL before it is officially published to PyPI
import sys
sys.path.append("E:/learning/university/MiSIL/CFRL Python Package/CFRL")

In [None]:
import pandas as pd
import numpy as np
import torch
from sklearn.model_selection import train_test_split
from cfrl.reader import read_trajectory_from_dataframe, convert_trajectory_to_dataframe
from cfrl.preprocessor import SequentialPreprocessor
np.random.seed(1) # ensure reproducibility
torch.manual_seed(1) # ensure reproducibility

<torch._C.Generator at 0x17211bb14f0>

## Data Loading

In this demonstration, we use an offline trajectory generated from a `SyntheticEnvironment` using some pre-specified transition rules. Although it is actually synthesized, we treat it as if it is from some unknown environment for pedagogical convenience in this demonstration.

The trajectory contains 500 individuals (i.e. $N=500$) and 10 transitions (i.e. $T=10$). The actions are binary (0 or 1) and were sampled using a random policy that selects 0 or 1 randomly with equal probability. It is stored in a tabular format in a `.csv` file. The sensitive attribute variable is bivariate, stored in columns `z1` and `z2`. The legit values of the sensitive attribute are $[0, 0]$, $[1, 0]$, $[0, 1]$, and $[1, 1]$. The state variable is tri-variate, stored in columns `state1`, `state2`, and `state3`. The actions are stored in the column `action` and rewards in the column `reward`. The tabular data also includes an extra irrelevant column `timestamp`. 

We can load and view the tabular data.

In [8]:
trajectory = pd.read_csv('../data/sample_data_large_multi.csv')
trajectory

Unnamed: 0.1,Unnamed: 0,ID,timestamp,z1,z2,action,reward,state1,state2,state3
0,0,1.0,1.0,0.0,0.0,,,2.124345,-0.111756,-0.028172
1,1,1.0,2.0,0.0,0.0,1.0,3.380339,-0.071876,0.545250,-0.020279
2,2,1.0,3.0,0.0,0.0,0.0,1.849111,-1.084077,-1.696634,-1.179136
3,3,1.0,4.0,0.0,0.0,0.0,-4.421291,-2.317520,-1.787875,-2.148363
4,4,1.0,5.0,0.0,0.0,1.0,-5.142691,-2.936506,-3.603797,-3.590126
...,...,...,...,...,...,...,...,...,...,...
5495,5495,500.0,7.0,0.0,0.0,0.0,-12.563265,-4.024293,-6.587401,-3.859436
5496,5496,500.0,8.0,0.0,0.0,0.0,-14.073520,-5.952644,-5.854450,-4.218220
5497,5497,500.0,9.0,0.0,0.0,0.0,-16.691358,-5.687570,-6.008377,-5.618730
5498,5498,500.0,10.0,0.0,0.0,0.0,-18.394408,-7.551435,-6.816310,-6.740886


We now read the trajectory from the tabular format into Trajectory Arrays.

In [9]:
zs, states, actions, rewards, ids = read_trajectory_from_dataframe(
                                                data=trajectory, 
                                                z_labels=['z1', 'z2'], 
                                                state_labels=['state1', 'state2', 'state3'], 
                                                action_label='action', 
                                                reward_label='reward', 
                                                id_label='ID', 
                                                T=10
                                                )

## Preprocessor Training

Before preprocessing the trajectory, we need to first train a preprocessor. To mitigate overfitting, we use a random subset of 250 individuals in the trajectory to train the preprocessor. The remaining 250 individuals will be actually preprocessed. We now form these two sets.

In [10]:
(
    zs_train, zs_prepro, 
    states_train, states_prepro, 
    actions_train, actions_prepro, 
    rewards_train, rewards_prepro, 
    ids_train, ids_prepro
) = train_test_split(zs, states, actions, rewards, ids, test_size=0.5)

We now use the training set to train a `SequentialPreprocessor`.

In [11]:
sp = SequentialPreprocessor(z_space=[[0, 0], [0, 1], [1, 0], [1, 1]], 
                            num_actions=2, 
                            cross_folds=1, 
                            mode='single', 
                            reg_model='nn')
sp.train_preprocessor(zs=zs_train, xs=states_train, actions=actions_train, rewards=rewards_train)

100%|██████████| 1000/1000 [00:26<00:00, 38.00it/s]


(array([[[  0.29730609,   0.64565182,  -1.68102797, ...,   2.47447229,
            2.52162402,   0.375634  ],
         [ -1.35034815,  -2.25309663,  -1.63382889, ...,   2.82107386,
            1.81303627,   1.2983024 ],
         [ -3.42806811,  -4.52740321,  -3.80843044, ...,   1.68164783,
            0.83195773,   0.71035158],
         ...,
         [ -6.80947176,  -7.44296927,  -6.91436753, ...,   4.73078212,
            3.0231462 ,   3.61140981],
         [ -8.76876788,  -7.18474406,  -8.88386579, ...,   2.99160762,
            4.22119409,   2.35490637],
         [ -9.04944974,  -9.08450819,  -9.40940367, ...,   3.65938634,
            3.81831216,   2.64957179]],
 
        [[  0.30006061,   0.50738898,   0.77566408, ...,   2.47722681,
            2.38336118,   2.83232604],
         [ -1.27571934,  -1.32202527,  -2.06041935, ...,   3.12821848,
            2.43734542,   1.80595765],
         [ -3.04574095,  -4.47713707,  -2.44819125, ...,   2.59558333,
            1.63423945,   3.7662

## Data Preprocessing

We now preprocess the remaining data that are not in the training set.

In [12]:
states_tilde, rewards_tilde = sp.preprocess_multiple_steps(zs=zs_prepro, 
                                                           xs=states_prepro, 
                                                           actions=actions_prepro, 
                                                           rewards=rewards_prepro)

## Data Exporting

Finally, we convert the preprocessed trajectory back into the tabular format so that it is easier to store and manage. For simplicity, we call the new states as `state1`, ..., `state12`, which is the default option provided by `convert_trajectory_to_dataframe()` (so we do not need to specify the `state_labels` argument).

In [13]:
preprocessed_trajectory = convert_trajectory_to_dataframe(
                                        zs=zs_prepro, 
                                        states=states_tilde, 
                                        actions=actions_prepro, 
                                        rewards=rewards_tilde, 
                                        ids=ids_prepro, 
                                        z_labels=['z1', 'z2'], 
                                        action_label='action', 
                                        reward_label='reward', 
                                        id_label='ID', 
                                        T_label='time_step'
                                        )
preprocessed_trajectory

Unnamed: 0,ID,time_step,z1,z2,action,reward,state1,state2,state3,state4,state5,state6,state7,state8,state9,state10,state11,state12
0,305.0,1.0,1.0,0.0,,,-0.317405,0.840769,0.844372,0.459064,2.174768,1.944196,0.454466,1.855285,2.255392,1.859761,2.716742,2.901034
1,305.0,2.0,1.0,0.0,0.0,5.943142,0.591839,2.273749,-2.964740,2.850154,2.890590,-0.839674,2.435313,2.773919,-1.033562,3.989861,4.840905,1.169578
2,305.0,3.0,1.0,0.0,1.0,6.963574,-1.556345,1.535565,-4.969775,0.561685,2.874042,-1.193027,0.170335,2.534116,-1.045698,3.024292,5.545709,0.956615
3,305.0,4.0,1.0,0.0,0.0,3.574153,-1.244262,-2.983129,-3.577484,1.766711,-0.838226,0.978989,0.223361,-0.293243,0.088610,4.176708,2.635336,3.516911
4,305.0,5.0,1.0,0.0,1.0,2.344962,-2.481352,-1.894014,-3.168303,1.894036,3.159167,1.359637,1.239327,0.528893,0.176984,5.560400,5.763371,4.474270
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2745,473.0,7.0,1.0,0.0,1.0,4.258193,-4.470952,-4.731453,-4.694901,0.682225,2.624084,1.801081,1.912245,0.770409,0.296687,6.751650,6.337820,5.818833
2746,473.0,8.0,1.0,0.0,0.0,1.616660,-7.318408,-6.230546,-6.384940,-0.229876,-0.117347,-0.397952,-1.036336,-0.033916,-1.039807,4.842526,5.311596,4.712371
2747,473.0,9.0,1.0,0.0,1.0,0.549662,-8.321184,-7.139310,-6.335348,-1.684079,0.020363,0.473644,-2.017924,-1.247170,-0.657567,4.685763,6.224120,5.881523
2748,473.0,10.0,1.0,0.0,1.0,-1.210735,-8.549094,-8.033413,-10.365494,-0.373798,-0.309958,-4.348989,-1.900819,-1.737821,-3.682403,5.174072,5.993990,3.188451


## Alternative: Preprocessing All Individuals

Sometimes, the number of individuals in the trajectory is small. In this case, if we only preprocess a subset of individuals, the resulting preprocessed trajectory might be too small to be useful for policy learning. In this case, we can directly preprocess all individuals using the `train_preprocessor()` function when we set `cross_folds` to a relatively large number.

When `cross_folds=K` where `K` is greater than 1, `train_preprocessor()` will internally divide the training data into `K` folds. For each $i=1,\dots,k$, it trains a transition dynamics model based on all the folds other than the $i$-th one, and this model is then used to preprocess data in the $i$-th fold. This results in `K` folds of preprocessed data, each of which is processed using a model that is trained on the other folds. These `K` folds of preprocessed data are then combined and returned by `train_preprocessor()`. This method allows us to preprocess all individuals in the trajectory while reducing overfitting.

To use this functionality, we first initialize a `SequentialPreprocessor` with `cross_folds` greater than 1. We use `cross_folds=5` here.

In [14]:
sp_cf5 = SequentialPreprocessor(z_space=[[0, 0], [0, 1], [1, 0], [1, 1]], 
                                num_actions=2, 
                                cross_folds=5, 
                                mode='single', 
                                reg_model='nn')

We now simultaneously train the preprocessor and preprocess all individuals in the trajectory using the precedure described above.

In [15]:
states_tilde_cf5, rewards_tilde_cf5 = sp_cf5.train_preprocessor(zs=zs, 
                                                                xs=states, 
                                                                actions=actions, 
                                                                rewards=rewards)

100%|██████████| 1000/1000 [00:42<00:00, 23.31it/s]
100%|██████████| 1000/1000 [00:55<00:00, 18.07it/s]
100%|██████████| 1000/1000 [00:59<00:00, 16.84it/s]
100%|██████████| 1000/1000 [01:05<00:00, 15.20it/s]
100%|██████████| 1000/1000 [01:07<00:00, 14.80it/s]


We can now convert the preprocessed trajectory into the tabular format.

In [16]:
preprocessed_trajectory_cf5 = convert_trajectory_to_dataframe(
                                            zs=zs, 
                                            states=states_tilde_cf5, 
                                            actions=actions, 
                                            rewards=rewards_tilde_cf5, 
                                            ids=ids, 
                                            z_labels=['z1', 'z2'], 
                                            action_label='action', 
                                            reward_label='reward', 
                                            id_label='ID', 
                                            T_label='time_step'
                                            )
preprocessed_trajectory_cf5

Unnamed: 0,ID,time_step,z1,z2,action,reward,state1,state2,state3,state4,state5,state6,state7,state8,state9,state10,state11,state12
0,1.0,1.0,0.0,0.0,,,2.124345,-0.111756,-0.028172,2.854524,0.967020,1.023984,2.882485,0.818889,1.347807,4.161159,1.819790,2.100349
1,1.0,2.0,0.0,0.0,1.0,7.684757,-0.071876,0.545250,-0.020279,1.269826,2.283655,1.815975,1.899828,2.761103,2.621160,3.501701,4.554760,4.560613
2,1.0,3.0,0.0,0.0,0.0,8.859339,-1.084077,-1.696634,-1.179136,1.435078,0.405164,1.464013,1.862992,1.097608,2.235863,4.394303,3.478711,4.615738
3,1.0,4.0,0.0,0.0,0.0,4.483335,-2.317520,-1.787875,-2.148363,0.453051,1.441981,0.532031,0.668264,1.575517,0.895652,4.077064,4.734199,4.109906
4,1.0,5.0,0.0,0.0,1.0,4.834683,-2.936506,-3.603797,-3.590126,0.177945,-0.097932,0.550886,0.778961,0.301101,0.614142,4.460872,4.074027,5.026805
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5495,500.0,7.0,0.0,0.0,0.0,2.348154,-4.024293,-6.587401,-3.859436,1.645661,-0.593785,1.090324,0.046822,-2.116011,0.087280,6.270967,3.157844,6.165725
5496,500.0,8.0,0.0,0.0,0.0,2.167277,-5.952644,-5.854450,-4.218220,-0.390749,0.205909,1.725421,-1.423516,-1.086781,0.193406,4.805896,4.637991,6.856439
5497,500.0,9.0,0.0,0.0,0.0,0.265278,-5.687570,-6.008377,-5.618730,0.789887,-0.186370,0.329568,-0.072005,-1.124341,-0.382315,5.892154,5.007221,5.725949
5498,500.0,10.0,0.0,0.0,0.0,0.074094,-7.551435,-6.816310,-6.740886,-0.995104,-0.590139,-0.443089,-1.327496,-1.099913,-1.872504,4.568780,5.049233,4.987872
