# Dependence Networks

From [http://openonlinecourses.com/causalanalysis/ReviewIndependence.asp](http://openonlinecourses.com/causalanalysis/ReviewIndependence.asp).

## Question 1.1 complete independence

Load data.

In [1]:
import pandas as pd

df = pd.DataFrame({
    'MD': ['George'] * 4 + ['Smith'] * 4,
    'RN': ['Jim', 'Jim', 'Jill', 'Jill'] * 2,
    'Complaint': ['Yes', 'No'] * 4,
    'Observed': [53, 424, 11, 37, 0, 18, 4, 139]
})

df

Unnamed: 0,MD,RN,Complaint,Observed
0,George,Jim,Yes,53
1,George,Jim,No,424
2,George,Jill,Yes,11
3,George,Jill,No,37
4,Smith,Jim,Yes,0
5,Smith,Jim,No,18
6,Smith,Jill,Yes,4
7,Smith,Jill,No,139


Compute the `expected` (here, called `Predicted`) values.

In [2]:
from functools import reduce

def get_count(field, val):
    return df[df[field] == val]['Observed'].sum()

def get_expected(r):
    counts = [get_count(f, r[f]) for f in ['MD', 'RN', 'Complaint']]
    n = df['Observed'].sum()
    expected = reduce(lambda a, b: a * b, counts) / (n * n)
    return expected
    
df['Predicted'] = df.apply(get_expected, axis=1)
df

Unnamed: 0,MD,RN,Complaint,Observed,Predicted
0,George,Jim,Yes,53,37.551318
1,George,Jim,No,424,341.275213
2,George,Jill,Yes,11,14.489498
3,George,Jill,No,37,131.683971
4,Smith,Jim,Yes,0,11.515737
5,Smith,Jim,No,18,104.657732
6,Smith,Jill,Yes,4,4.443446
7,Smith,Jill,No,139,40.383084


Compute the $\chi^2$ value.

In [3]:
import numpy as np

def diff_sq(r):
    return np.power(r['Observed'] - r['Predicted'], 2) / r['Predicted']

chi_sq = df.apply(diff_sq, axis=1).sum()
chi_sq

419.4679873479184

Compute the degrees of freedom.

In [4]:
def get_dof():
    num_values = [len(df[f].unique()) for f in ['MD', 'RN', 'Complaint']]
    IJK = reduce(lambda a, b: a * b, num_values)
    dof = [IJK] + [-n for n in num_values] + [2]
    
    return sum(dof)

dof = get_dof()
dof

4

There's many ways to compute the p-value. This approach below uses the observed and expected (Predicted) values directly with the `chisquare` function.

In [5]:
from scipy.stats import chisquare

chisquare(df['Observed'], df['Predicted'], ddof=dof)

Power_divergenceResult(statistic=419.4679873479184, pvalue=1.342780486250113e-90)

But, since we already have the $\chi^2$ value and degrees of freedom, we can use the `chi2.cdf` function.

In [6]:
from scipy.stats import chi2

1 - chi2.cdf(chi_sq, dof)

0.0

## Question 1.2

### Question 1.2.1 three joint independence models

- I = MD
- J = RN
- K = Complaint

In [7]:
def get_expected(r, i, j, k):
    Y_ij_ = df[(df[i]==r[i]) & (df[j]==r[j])]['Observed'].sum()
    Y___k = df[df[k]==r[k]]['Observed'].sum()
    n = df['Observed'].sum()
    
    expected = Y_ij_ * Y___k / n
    return expected

df['Predicted'] = df.apply(lambda r: get_expected(r, 'MD', 'RN', 'Complaint'), axis=1)
df

Unnamed: 0,MD,RN,Complaint,Observed,Predicted
0,George,Jim,Yes,53,47.282799
1,George,Jim,No,424,429.717201
2,George,Jill,Yes,11,4.758017
3,George,Jill,No,37,43.241983
4,Smith,Jim,Yes,0,1.784257
5,Smith,Jim,No,18,16.215743
6,Smith,Jill,Yes,4,14.174927
7,Smith,Jill,No,139,128.825073


In [8]:
def get_dof(i, j, k):
    I = len(df[i].unique())
    J = len(df[j].unique())
    K = len(df[k].unique())
    IJK = I * J * K
    
    dof = (IJK - 1) - ((I-1) + (J-1) + (K-1))
    
    return dof

chisquare(df['Observed'], df['Predicted'], ddof=get_dof('MD', 'RN', 'Complaint'))

Power_divergenceResult(statistic=19.945072739172545, pvalue=0.0001742500389547796)

### Question 1.2.2 three joint independence models

- I = MD
- J = Complaint
- K = RN

In [9]:
df['Predicted'] = df.apply(lambda r: get_expected(r, 'MD', 'Complaint', 'RN'), axis=1)
df

Unnamed: 0,MD,RN,Complaint,Observed,Predicted
0,George,Jim,Yes,53,46.180758
1,George,Jim,No,424,332.645773
2,George,Jill,Yes,11,17.819242
3,George,Jill,No,37,128.354227
4,Smith,Jim,Yes,0,2.886297
5,Smith,Jim,No,18,113.287172
6,Smith,Jill,Yes,4,1.113703
7,Smith,Jill,No,139,43.712828


In [10]:
chisquare(df['Observed'], df['Predicted'], ddof=get_dof('MD', 'Complaint', 'RN'))

Power_divergenceResult(statistic=391.950049303092, pvalue=1.2268386569229332e-84)

### Question 1.2.3 three joint independence models

- I = RN
- J = Complaint
- K = MD

In [11]:
df['Predicted'] = df.apply(lambda r: get_expected(r, 'RN', 'Complaint', 'MD'), axis=1)
df

Unnamed: 0,MD,RN,Complaint,Observed,Predicted
0,George,Jim,Yes,53,40.561224
1,George,Jim,No,424,338.265306
2,George,Jill,Yes,11,11.479592
3,George,Jill,No,37,134.693878
4,Smith,Jim,Yes,0,12.438776
5,Smith,Jim,No,18,103.734694
6,Smith,Jill,Yes,4,3.520408
7,Smith,Jill,No,139,41.306122


In [12]:
chisquare(df['Observed'], df['Predicted'], ddof=get_dof('RN', 'Complaint', 'MD'))

Power_divergenceResult(statistic=410.84182247817546, pvalue=9.923443883578908e-89)

## Question 1.3

In [13]:
df = pd.DataFrame({
    'MD': [0] * 4 + [1] * 4,
    'RN': [0, 0, 1, 1] * 2,
    'Complaint': [1, 0] * 4,
    'Observed': [53, 424, 11, 37, 0, 18, 4, 139]
})

df

Unnamed: 0,MD,RN,Complaint,Observed
0,0,0,1,53
1,0,0,0,424
2,0,1,1,11
3,0,1,0,37
4,1,0,1,0
5,1,0,0,18
6,1,1,1,4
7,1,1,0,139


### Question 1.3.1 Observed ~ (RN + MD + Complaint)^2

In [14]:
from patsy import dmatrices

formula = 'Observed ~ (RN + MD + Complaint)**2 - 1'
y, X = dmatrices(formula, df, return_type='dataframe')
y = y.values.reshape(1, -1)[0]

X

Unnamed: 0,RN,MD,Complaint,RN:MD,RN:Complaint,MD:Complaint
0,0.0,0.0,1.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,1.0,0.0,1.0,0.0
3,1.0,0.0,0.0,0.0,0.0,0.0
4,0.0,1.0,1.0,0.0,0.0,1.0
5,0.0,1.0,0.0,0.0,0.0,0.0
6,1.0,1.0,1.0,1.0,1.0,1.0
7,1.0,1.0,0.0,1.0,0.0,0.0


In [15]:
y

array([ 53., 424.,  11.,  37.,   0.,  18.,   4., 139.])

In [16]:
from sklearn.linear_model import PoissonRegressor

model = PoissonRegressor()
model.fit(X, y)

pd.concat([
    pd.Series(model.intercept_, ['intercept']), 
    pd.Series(model.coef_, X.columns)
])

intercept       5.903406
RN             -1.637902
MD             -1.990005
Complaint      -1.778614
RN:MD           2.479976
RN:Complaint   -0.210420
MD:Complaint   -0.828974
dtype: float64

### Question 1.3.1 Observed ~ RN + MD + Complaint + RN:Complaint + MD:Complaint

In [17]:
formula = 'Observed ~ RN + MD + Complaint + RN:Complaint + MD:Complaint - 1'
y, X = dmatrices(formula, df, return_type='dataframe')
y = y.values.reshape(1, -1)[0]

In [18]:
model = PoissonRegressor()
model.fit(X, y)

pd.concat([
    pd.Series(model.intercept_, ['intercept']), 
    pd.Series(model.coef_, X.columns)
])

intercept       5.746485
RN             -0.859370
MD             -1.033980
Complaint      -1.733682
RN:Complaint   -0.395496
MD:Complaint   -0.850060
dtype: float64

### Question 1.3.1 Observed ~ RN + MD + Complaint + RN:MD + RN:Complaint

In [19]:
formula = 'Observed ~ RN + MD + Complaint + RN:MD + RN:Complaint - 1'
y, X = dmatrices(formula, df, return_type='dataframe')
y = y.values.reshape(1, -1)[0]

In [20]:
model = PoissonRegressor()
model.fit(X, y)

pd.concat([
    pd.Series(model.intercept_, ['intercept']), 
    pd.Series(model.coef_, X.columns)
])

intercept       5.914860
RN             -1.617797
MD             -2.062572
Complaint      -1.868571
RN:MD           2.483688
RN:Complaint   -0.436754
dtype: float64

### Question 1.3.1 Observed ~ RN + MD + Complaint + RN:MD + MD:Complaint

In [21]:
formula = 'Observed ~ RN + MD + Complaint + RN:MD + MD:Complaint - 1'
y, X = dmatrices(formula, df, return_type='dataframe')
y = y.values.reshape(1, -1)[0]

In [22]:
model = PoissonRegressor()
model.fit(X, y)

pd.concat([
    pd.Series(model.intercept_, ['intercept']), 
    pd.Series(model.coef_, X.columns)
])

intercept       5.907780
RN             -1.661851
MD             -1.987179
Complaint      -1.813348
RN:MD           2.489249
MD:Complaint   -0.889178
dtype: float64