# Notebook 15: Confidence Intervals
***

We'll need Numpy, Matplotlib, Pandas, and scipy.stats for this notebook, so let's load them. 

In [2]:
import numpy as np 
from scipy import stats
import pandas as pd 
import matplotlib.pyplot as plt 
%matplotlib inline

### Exercise 1 - Single Sample CI
*** 
Load `hubble.csv` into Python. A description of the variables can be obtained from page 73 of https://cran.r-project.org/web/packages/gamair/gamair.pdf.

In [4]:
filepath = '../data/hubble.csv'

df = pd.read_csv(filepath)

df.head(10)

Unnamed: 0,Galaxy,y,x
0,NGC0300,133,2.0
1,NGC0925,664,9.16
2,NGC1326A,1794,16.14
3,NGC1365,1594,17.95
4,NGC1425,1473,21.88
5,NGC2403,278,3.22
6,NGC2541,714,11.22
7,NGC2090,882,11.75
8,NGC3031,80,3.63
9,NGC3198,772,13.8


#### (a) Calculate the 85% confidence interval for the mean of a galaxy's distance from Earth in Mega parsecs in Python by doing the computation explicitly.  Use the large sample approximation even though we only have an n of 24.

In [7]:
n = len(df)
xbar = np.mean(df['x'])
svar = np.var(df['x'])
sd = np.sqrt(svar)
critz = stats.norm.ppf(.925)

print("CI: ", xbar - (critz*(sd/np.sqrt(n))), xbar + (critz*(sd/np.sqrt(n))))


CI:  10.381963208317911 13.727203458348756


#### (b) Can you find a built in stats function that does this computation automatically? 

In [8]:
stats.norm.interval(.85, loc=xbar, scale=sd/np.sqrt(n))

(10.381963208317911, 13.727203458348756)

#### (c) Interpret the confidence interval.

### Exercise 2 - Two Sample CI
*** 
Load `clean_titanic_data` into Python.

In [9]:
# Path to the data - select the path that works for you 
file_path = '../data/clean_titanic_data.csv'

# Load the data into a DataFrame 
df = pd.read_csv(file_path)
df.head(10)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,S
5,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,S
6,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,S
7,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,S
8,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,C
9,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7,S


#### a) Calculate a 98% CI for the survival rate of all passengers.

In [None]:
stats.norm.interval(.85, loc=xbar, scale=sd/np.sqrt(n))

#### b) Calculate a 98% CI for the survival rate of men (all passenger classes).


#### c) Calculate a 98% CI for the survival rate of women (all passenger classes).


#### d) Calculate a 98% CI for the difference in survival rates between men and women.


#### e) What can you conclude?