# WATE

**Subclassification Method => weighting differences in means by strate-specific weights**

Question: Was being seated in first class improve your odds of survival during the Titanic sink?

Variables:
- D: is First Class
- Y: Survived?
- W: is woman?
- C: is child?

WATE Assumptions:
- Common Support Assumption => there exist observations in both treatment and control in each strata
- Backdoor Criterion => 

https://mixtape.scunning.com/05-matching_and_subclassification

### Reading the data

In [None]:
import numpy as np 
import pandas as pd 
import statsmodels.api as sm 
import statsmodels.formula.api as smf 
from itertools import combinations 
# import plotnine as p

# read data
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
def read_data(file): 
    return pd.read_stata("https://github.com/scunning1975/mixtape/raw/master/" + file)

df = read_data("titanic.dta")

### Data Exploration

In [None]:
for col in df.columns:
    print(col, df[col].unique())

### Method 1: Computing Simple Difference in Outcomes (SDO)

In [None]:
Y1 = df[df['class'] == '1st class'] ['survived']
# Y0 = df[df['class'].isin(['2nd class', '3rd class'])] ['survived']
Y0 = df[df['class'] != '1st class'] ['survived']


In [None]:
EY1 = sum([1 for val in Y1 if val == 'yes']) / len(Y1)


In [None]:
EY0 = sum([1 for val in Y0 if val == 'yes']) / len(Y0)


In [None]:
print(f"SDO: {EY1 - EY0}")

### Method 2: Subclassification - Considering Age and Sex (DEPR)

Weighted average between each class:
- Row - D: young male, young female, adult male, adult female
- Column - Yi: Survived
- Column - C: Count

In [None]:
dff = df.copy()
# dff[dff['class'] != '1st class']['class'] = '2nd class+'
dff['class'] = dff['class'].replace({'2nd class': 'other class', '3rd class': 'other class'
                                    , 'crew': 'other class'
                                     })
dff['survived'] = dff['survived'].replace({'yes': 1, 'no': 0})

In [None]:
t = dff.groupby(['class', 'age', 'sex']).agg({'survived': ['count', 'mean']}).reset_index()
# t.columns = t.columns.get_level_values(0)
# t.columns = ['_'.join(col).strip() if col[1] else col[0] for col in t.columns.values ]
t.columns = [col[1] if col[1] else col[0] for col in t.columns.values ]

In [None]:
t1 = t.pivot_table(values=['mean', 'count'], columns=['class'], index=['age', 'sex']).reset_index()


In [None]:
t1

In [None]:
def compute_subclass_average(df: pd.DataFrame, subclass: str):
    N = t1[('count', subclass)].sum().sum()
    # EY = (t1[('count', subclass)] / N * t1[('mean', subclass)]).sum()
    EY = ((t1[('count', subclass)] * t1[('mean', subclass)]).sum()) / N
    return EY

In [None]:
EY1 = compute_subclass_average(t1, '1st class')
EY0 = compute_subclass_average(t1, 'other class')
# EY2 = compute_subclass_average(t1, 'crew')

In [None]:
# print(f"Weighted ATE 1st class: {EY1}")
# print(f"Weighted ATE 1st class: {EY0}")
# print(f"Weighted ATE 1st class: {EY1 - EY2}")

### Method 2: Weighted Average Treatement Effect (WATE)

In [None]:
dff = df.copy()
dff['d'] = 0
dff.loc[dff['class'] == '1st class', 'd'] = 1

dff['age_d'] = 0
dff.loc[dff['age'] == 'adults', 'age_d'] = 1

dff['sex_d'] = 0
dff.loc[dff['sex'] == 'man', 'sex_d'] = 1

dff['survived_d'] = 0
dff.loc[dff['survived'] == 'yes', 'survived_d'] = 1

In [None]:
ey0 = dff.loc[dff['d'] == 0, 'survived_d'].mean()
ey1 = dff.loc[dff['d'] == 1, 'survived_d'].mean()
print(f"Simple Difference Outcome (SDO): {ey1 - ey0}")

In [None]:
dff['s'] = 0
dff.loc[(dff['age_d'] == 0) & (dff['sex_d'] == 0), 's'] = 1
dff.loc[(dff['age_d'] == 1) & (dff['sex_d'] == 0), 's'] = 2
dff.loc[(dff['age_d'] == 0) & (dff['sex_d'] == 1), 's'] = 3
dff.loc[(dff['age_d'] == 1) & (dff['sex_d'] == 1), 's'] = 4

In [None]:
def compute_wate(df: pd.DataFrame):
    df1 = df.loc[df['d'] == 1, 'survived_d']
    df0 = df.loc[df['d'] == 0, 'survived_d']
    diff = df1.mean() - df0.mean()
    weight = df[df['d'] == 0].shape[0] / obs
    return weight * diff

obs = dff.loc[dff['d'] == 0].shape[0]
wate = dff.groupby('s').apply(compute_wate).sum()
print(f"Weight Average Treatment Effect: {wate}")

Using SDO, we see that the average probability of survival in the first class is 35%, but using WATE, this percentage is reduced to 18%