# Practice notebook for confidence intervals using NHANES data

This notebook will give you the opportunity to practice working with confidence intervals using the NHANES data.

You can enter your code into the cells that say "enter your code here", and you can type responses to the questions into the cells that say "Type Markdown and Latex".

Note that most of the code that you will need to write below is very similar to code that appears in the case study notebook.  You will need to edit code from that notebook in small ways to adapt it to the prompts below.

To get started, we will use the same module imports and read the data in the same way as we did in the case study:

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
import statsmodels.api as sm

da = pd.read_csv("nhanes_2015_2016.csv")

## Question 1

Restrict the sample to women between 35 and 50, then use the marital status variable [DMDMARTL](https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.htm#DMDMARTL) to partition this sample into two groups - women who are currently married, and women who are not currently married.  Within each of these groups, calculate the proportion of women who have completed college.  Calculate 95% confidence intervals for each of these proportions.

In [2]:
da_female_35_50=da[(da['RIAGENDR']==2)&(da['RIDAGEYR']>=35)&(da['RIDAGEYR']<=50)]
da_female_35_50.reset_index(inplace=True, drop=True)

In [3]:
da_female_35_50_married=da_female_35_50[da_female_35_50['DMDMARTL']==1]
da_female_35_50_not_married=da_female_35_50[da_female_35_50['DMDMARTL'].isin([2,3,4,5,6])]
da_female_35_50_married.reset_index(inplace=True, drop=True)
da_female_35_50_not_married.reset_index(inplace=True, drop=True)

In [4]:
proportion_married=sum(da_female_35_50_married['DMDEDUC2']==5)/len(da_female_35_50_married)

In [5]:
proportion_married

0.36080178173719374

In [6]:
se_married=proportion_married*(1-proportion_married)

In [7]:
se_married

0.23062385603246013

In [8]:
n_married=len(da_female_35_50_married['DMDEDUC2'])

In [9]:
n_married

449

In [10]:
print('Lower Boundary: ', proportion_married-1.96*np.sqrt((se_married/n_married)))
print('Upper Boundary: ', proportion_married+1.96*np.sqrt((se_married/n_married)))

Lower Boundary:  0.3163811208674688
Upper Boundary:  0.4052224426069187


In [15]:
from statsmodels.stats.proportion import proportion_confint

In [17]:
ci_low, ci_upp = proportion_confint(72, 338, alpha=0.05, method='normal')
ci_low, ci_upp

(0.16936816767089768, 0.2566673352876822)

In [19]:
difference = ci_upp-ci_low
difference

0.08729916761678452

## Question 2

Construct a 95% confidence interval for the proportion of smokers who are female. Construct a 95% confidence interval for the proportion of smokers who are male. Construct a 95% confidence interval for the **difference** between those two gender proportions.

In [None]:
### https://github.com/karimkmafifi/Inferential-Statistical-Analysis-with-Python---Coursera/blob/master/nhanes_confidence_intervals_practice.ipynb