# National Health and Nutrition Examination Survey (NHANES) Analysis of Interactions between Alcohol Use and Depression and Impact on Sleep Disorders

## Introduction

Documentation about alcohol use dataset: https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/ALQ_I.htm  
Documentation about depression screener dataset: https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DPQ_I.htm  
Documentation about sleep disorder dataset: https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/SLQ_I.htm  

The two datasets we will be investigating are the alcohol use dataset and the depression screener dataset, with the outcome dataset as the sleep disorder. Our goal is to discover if there are any interactions between the use of alcohol and depression, as they relate to sleep problems. 

### Alcohol Use Dataset

The alcohol use dataset is restricted to only 18 years or older and contains a number of questions (e.g. # alcoholic drinks/day - past 12 mos, and Ever have 4/5 or more drinks every day?). These questions give us an understanding of potential alcoholism. We will make some assumptions to make this causal analysis binary. 

### Depression Screener Dataset

Different from the alcohol use dataset, the depression screen employs the PHQ-9 questionnaire<sup>1</sup> that allows us to collect all the data at the end and calculate a score on depression severity. The levels are _none, mild, moderate, moderately severe,_ and _severe_. We will be looking at _severe_ or otherwise to make the dataset binary again. 

### Sleep Disorders

Finally, our overall analysis will use the presence of sleep disorders as out outcome variable. We are essentially trying to see if alcohol use and depression interact to cause sleep disorders. The relevant questions in the sleep disorder questionnaire that we will be using are "Ever told doctor had trouble sleeping?" and "How often feel overly sleepy during day?". We will make a combination of these results to test causality. 

We can see our causal graph for interactions below: 

<img src="../../../images/causal_graph.png" alt="Causal Graph" style="width: 650px;" />

## Setup

In [1]:
import xport

import numpy as np
import scipy as scp

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# sns.set(color_codes=True)
%matplotlib inline

In [2]:
# open sleep dataset
with open("../../../data/nhanes/2015_2016/questionaires/SLQ_I.XPT", 'rb') as f:
    sleep_df = xport.to_dataframe(f)

# open alcohol dataset
with open("../../../data/nhanes/2015_2016/questionaires/ALQ_I.XPT", 'rb') as f:
    alc_df = xport.to_dataframe(f)

# open depression dataset
with open("../../../data/nhanes/2015_2016/questionaires/DPQ_I.XPT", 'rb') as f:
    depression_df = xport.to_dataframe(f)

# see all the output for a little bit
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

print(sleep_df.shape)
print(alc_df.shape)
print(depression_df.shape)

sleep_df.head()
alc_df.head()
depression_df.head()

(6327, 8)
(5735, 10)
(5735, 11)


Unnamed: 0,SEQN,DPQ010,DPQ020,DPQ030,DPQ040,DPQ050,DPQ060,DPQ070,DPQ080,DPQ090,DPQ100
0,83732.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,83733.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,83734.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0
3,83735.0,1.0,1.0,2.0,2.0,1.0,3.0,2.0,0.0,1.0,0.0
4,83736.0,1.0,1.0,1.0,1.0,3.0,0.0,1.0,0.0,0.0,0.0


We will first clean the data by removing patients that: 
    
1. don't know or refused to say if "Ever have 4/5 or more drinks every day?"
2. don't know or refused to answer any of the depression questions
3. don't know or refused to answer "How often feel overly sleepy during day?" or "Ever told doctor had trouble sleeping"

In [3]:
# now print last
InteractiveShell.ast_node_interactivity = "last_expr"


clean_alc_df = alc_df.loc[alc_df.ALQ151 != 7]
# print(clean_alc_df.shape)

# clean_sleep_df = sleep_df.loc[sleep_df.SLQ050 != 9]
clean_sleep_df = sleep_df.loc[sleep_df.SLQ120 != 9]
# print(clean_sleep_df.shape)
    
clean_depression_df = depression_df.loc[(depression_df.DPQ010 != 7) & (depression_df.DPQ010 != 9)
                                       & (depression_df.DPQ020 != 7) & (depression_df.DPQ020 != 9)
                                       & (depression_df.DPQ030 != 7) & (depression_df.DPQ030 != 9)
                                       & (depression_df.DPQ040 != 7) & (depression_df.DPQ040 != 9)
                                       & (depression_df.DPQ050 != 7) & (depression_df.DPQ050 != 9)
                                       & (depression_df.DPQ060 != 7) & (depression_df.DPQ060 != 9)
                                       & (depression_df.DPQ070 != 7) & (depression_df.DPQ070 != 9)                                                            & (depression_df.DPQ080 != 7) & (depression_df.DPQ080 != 9)
                                       & (depression_df.DPQ090 != 7) & (depression_df.DPQ090 != 9)
                                       & (depression_df.DPQ100 != 7) & (depression_df.DPQ100 != 9)]

# print(clean_depression_df.shape)

## Causal Risk Measures

We will calculate the causal risk ratio and risk difference to start exploration. 

In [4]:
# import math
# df = pd.DataFrame(np.random.randn(10, 2), columns=list('ab'))
# print(df)
# df.query('a > b')

combined_dataset = pd.merge(clean_alc_df, clean_sleep_df, on="SEQN")
combined_dataset = pd.merge(combined_dataset, clean_depression_df, on="SEQN")
print(combined_dataset.shape)

total_responders = combined_dataset.shape[0]

(5701, 27)


We will be using the sum of all the depression questionnaire questions, how often someone feels "SLQ120 - How often feel overly sleepy during day?", and "ALQ151 - Ever have 4/5 or more drinks every day?". We are going to use the depression questionnaire and make it binary, but taking >9 and < 9 as the levels. With the sleep disorders, if a study participant feels always sleepy during the day, or not. 

As we perform the following calculations, **we assume random assignment of treatment** to allow for the observed risks to be the counterfactual risks.

In [5]:
col_list= list(combined_dataset)
col_list.remove('SEQN')
# col_list
col_list = ['DPQ010', 'DPQ020', 'DPQ030', 'DPQ040', 'DPQ050', 'DPQ060', 'DPQ070', 'DPQ080', 'DPQ090', 'DPQ100']
combined_dataset['DPQ_SUM'] = combined_dataset[col_list].sum(axis=1, min_count=len(col_list))
# combined_dataset.head()

In [6]:
import math

a_1d_1 = combined_dataset.loc[(combined_dataset.ALQ151 == 1) & (combined_dataset.DPQ_SUM > 9) & (combined_dataset.SLQ120 == 4)]
a_1d_0 = combined_dataset.loc[(combined_dataset.ALQ151 == 1) & (combined_dataset.DPQ_SUM <= 9) & (combined_dataset.SLQ120 == 4)]
a_0d_1 = combined_dataset.loc[(combined_dataset.ALQ151 == 2) & (combined_dataset.DPQ_SUM > 9) & (combined_dataset.SLQ120 == 4)]
a_0d_0 = combined_dataset.loc[(combined_dataset.ALQ151 == 2) & (combined_dataset.DPQ_SUM <= 9) & (combined_dataset.SLQ120 == 4)]

risk_diff_left = (len(a_1d_1)/total_responders) - (len(a_0d_1)/total_responders) 
risk_diff_right = (len(a_1d_0)/total_responders) - (len(a_0d_0)/total_responders) 
print("Interaction on Additive Scale Left Side %f" % risk_diff_left)
print("Interaction on Additive Scale Right Side %f" % risk_diff_right)


risk_diff_left_part1 = math.sqrt(((len(a_1d_1)/total_responders))*(1-(len(a_1d_1)/total_responders))/total_responders)
risk_diff_left_part2 = math.sqrt(((len(a_0d_1)/total_responders) )*(1-(len(a_0d_1)/total_responders) )/total_responders)

risk_diff_right_part1 = math.sqrt(((len(a_1d_0)/total_responders))*(1-(len(a_1d_0)/total_responders))/total_responders)
risk_diff_right_part2 = math.sqrt(((len(a_0d_0)/total_responders))*(1-(len(a_0d_0)/total_responders))/total_responders)

print("Risk Diff Unc Left", math.sqrt(math.pow(risk_diff_left_part1,2) + math.pow(risk_diff_left_part2,2) ))
print("Risk Diff Unc Right", math.sqrt(math.pow(risk_diff_right_part1,2) + math.pow(risk_diff_right_part2,2)))

super_additive = risk_diff_left > risk_diff_right
sub_additive = risk_diff_left < risk_diff_right
print(super_additive)
# print(sub_additive)

Interaction on Additive Scale Left Side -0.010174
Interaction on Additive Scale Right Side -0.023329
Risk Diff Unc Left 0.0018283162790836988
Risk Diff Unc Right 0.0025844375669305097
True


We have a super-additive interaction, that is pretty low. We will check on the multiplicative scale.

In [7]:
risk_ratio_left = (len(a_1d_1)/total_responders)/(len(a_0d_0)/total_responders)
risk_ratio_right = ((len(a_1d_0)/total_responders)/(len(a_0d_0)/total_responders)) * ((len(a_0d_1)/total_responders)/(len(a_0d_0)/total_responders))

print("Interaction on Multiplicative Scale Left Side %f" % risk_ratio_left)
print("Interaction on Multiplicative Scale Right Side %f" % risk_ratio_right)


risk_ratio_left_part1 = math.sqrt(((len(a_1d_1)/total_responders))*(1-(len(a_1d_1)/total_responders))/total_responders)
risk_ratio_left_part1 = math.sqrt(((len(a_0d_0)/total_responders))*(1-(len(a_0d_0)/total_responders))/total_responders)

risk_ratio_right_part1 = math.sqrt(((len(a_1d_0)/total_responders))*(1-(len(a_1d_0)/total_responders))/total_responders)
risk_ratio_right_part2 = math.sqrt(((len(a_0d_0)/total_responders))*(1-(len(a_0d_0)/total_responders))/total_responders)
risk_ratio_right_part3 = math.sqrt(((len(a_0d_1)/total_responders))*(1-(len(a_0d_1)/total_responders))/total_responders)
risk_ratio_right_part4 = math.sqrt(((len(a_0d_0)/total_responders))*(1-(len(a_0d_0)/total_responders))/total_responders)


print("Risk Ratio Unc Left", math.sqrt(math.pow(risk_ratio_left_part1,2) + math.pow(risk_ratio_left_part1,2) ))
print("Risk Ratio Unc Right", math.sqrt(math.pow(risk_ratio_right_part1,2) + math.pow(risk_ratio_right_part2,2) + math.pow(risk_ratio_right_part3,2) + math.pow(risk_ratio_right_part4,2) ))



super_multiplicative = risk_ratio_left > risk_ratio_right
sub_multiplicative = risk_ratio_left < risk_ratio_right
print(super_multiplicative)
# print(sub_multiplicative)

Interaction on Multiplicative Scale Left Side 0.146067
Interaction on Multiplicative Scale Right Side 0.119303
Risk Ratio Unc Left 0.003257511511355956
Risk Ratio Unc Right 0.003812011462095985
True


We have a super-multiplicative interaction, that is pretty low. We will check on the multiplicative scale.

## Sufficient Cause Interactions

We can also answer the question of if particular types of individuals exist. Namely, if receiving both the treatments allows individuals to develop the outcome, but not either of them individually. We can use the following equation (from VanderWeele & Robins, 2007 & 2008) to do the calculation, again assuming random assignment of treatment. 

<img src="../../../images/sufficient_condition.png" alt="Sufficient Condition" style="width: 650px;" />

In [8]:
suff_cause_left = (len(a_1d_1)/total_responders) - (len(a_0d_1)/total_responders)
suff_cause_right = (len(a_1d_0)/total_responders)


suff_cause_left_unc_part1 = math.sqrt((1-(len(a_1d_1)/total_responders))*((len(a_1d_1)/total_responders))/total_responders)
suff_cause_left_unc_part2 = math.sqrt(((len(a_0d_1)/total_responders))*(1-(len(a_0d_1)/total_responders))/total_responders)
suff_cause_right_unc = math.sqrt(((len(a_1d_0)/total_responders))*(1-(len(a_1d_0)/total_responders))/total_responders)

total_unc_left = math.sqrt(math.pow(suff_cause_left_unc_part1,2) + math.pow(suff_cause_left_unc_part2,2))
total_unc_right = suff_cause_right_unc
print("Unc Left ", total_unc_left)
print("Unc Right ", total_unc_right)

print(suff_cause_left)
print(suff_cause_right)

print("Sufficient condition is satisfied: %s" % (suff_cause_left > suff_cause_right))

Unc Left  0.0018283162790836988
Unc Right  0.0011720183079002694
-0.010173653744957024
0.00789335204350114
Sufficient condition is satisfied: False


This sufficient condition is far too strict and can often miss types 7 and 8<sup>3</sup> types of individuals. Therefore, we can actually use **superadditive** interactions to say that there is synergism between the treatments. However, this is only possible if we assume _monotonicity_, which is the concept that "receiving treatments _A_ and _E_ cannot prevent any individual from developing the outcome." In our case that is true since the treatments of alcohol abuse or depression are separate from sleep disorders developing. 

We can then say that **alcohol abuse and depression both allows individuals to develop the outcome together, but not either of them individually.**

## References

<sup>1</sup>Kroenke K, Spitzer RL, William JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001; 16: 1606-13.  
<sup>2</sup>https://www.niaaa.nih.gov/alcohol-health/overview-alcohol-consumption/moderate-binge-drinking  
<sup>3</sup>Hern√°n MA, Robins JM (2018). Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming. pg. 53-68.  