## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [48]:
from pylab import *
import pandas as pd
import numpy as np
import scipy.stats as st

In [18]:
df = pd.read_csv('data/human_body_temperature.csv')

In [19]:
df.head()

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68
1,98.4,F,81
2,97.8,M,73
3,99.2,F,66
4,98.0,F,73


In [15]:
dftemp = df['temperature']
dftemp.head()

0    99.3
1    98.4
2    97.8
3    99.2
4    98.0
Name: temperature, dtype: float64

In [20]:
figure()

<matplotlib.figure.Figure at 0xbe7bf98>

In [53]:
hist(dftemp)

(array([  4.,   6.,  15.,  26.,  30.,  30.,  15.,   1.,   2.,   1.]),
 array([  96.3 ,   96.75,   97.2 ,   97.65,   98.1 ,   98.55,   99.  ,
          99.45,   99.9 ,  100.35,  100.8 ]),
 <a list of 10 Patch objects>)

In [54]:
show()

In [145]:
xpoint = np.mean(dftemp)
xpoint

98.24923076923078

In [146]:
np.median(dftemp)

98.299999999999997

In [147]:
#Question 1
#The distribution is nearly normal 

In [148]:
#Question :
# Ho : mu = 98.6
# Ha : mu > or < 98.6
#

In [149]:
n1 = len(dftemp)
n1

130

In [150]:
np.std(dftemp)

0.7303577789050377

In [151]:
sqrtn1 = np.sqrt(n1)

In [152]:
SEz = np.std(dftemp)/np.sqrt(n1)
SEz

0.064056614695193359

In [153]:
mu = 98.6

In [154]:
z = (xpoint - mu)/SEz
z

-5.4759252020781162

In [155]:
#st.norm.ppf(.95) Gets the zscore of the probability
pvalue = st.norm.cdf(z) #Gets the probability of the given z score
pvalue = pvalue * 2
pvalue


4.3523151658821886e-08

In [156]:
#Z ditribution analysis
#pvalue is very low and cannot be seen.
#Reject the Ho and chose the Ha
#Using the Z distribution we learn that the population is mean is not equal to 98.6

In [52]:
# Analyze the t statistics
# T statistics is used for sample sizes ( n1 ) < 30 and if the standard deviation is unknown
# In this example sample size is > 30 and std is known hence t statistics will not be applicable

In [56]:
#3.At what temperature should we consider someone's temperature to be "abnormal"?
#Start by computing the margin of error and confidence interval.
#st.norm.ppf(.025)

In [60]:
upBound = xpoint + 1.96*SEz
lowBound = xpoint - 1.96*SEz
upBound

98.374781734033363

In [61]:
lowBound

98.123679804428193

In [62]:
#95% confidence interval for the given sample 
# (98.12,98.37)
# Any temperature value outside this range will be deemed abnormal in the sample

In [110]:
#4.Is there a significant difference between males and females in normal temperature?•
#Set up and solve for a two sample hypothesis testing
dft = df.drop('heart_rate', 1)
dfm = dft
dff = dft
dfm = dfm[dfm.gender == 'M']
len(dfm)
#dfm.head()
dff = dff[dff.gender == 'F']
#dff.head()
dfm = dfm.drop('gender', 1)
dff = dff.drop('gender', 1)
#gb = dft.groupby('gender')
#mfg = [gb.get_group(x) for x in gb.groups]
#mg = mfg[0]
#fg = mfg[1]
#mg = mg.reindex()
#mg.stack('temperature')

In [141]:
dfm = dfm.reset_index(['temperature'])#Need to reset_index to reorder the indices
dfm = dfm.drop('index', 1)
dff = dff.reset_index(['temperature'])#Need to reset_index to reorder the indices
dff = dff.drop('index', 1)
df_diff = dfm - dff#Find difference between the M and F temperature values
n_diff = len(df_diff)
df_diff = df_diff['temperature']
df_diff.head()


0   -1.5
1    0.8
2   -1.2
3    0.8
4   -0.2
Name: temperature, dtype: float64

In [142]:
SE = np.std(df_diff)/np.sqrt(n_diff)
SE
xp_diff = np.mean(df_diff)
xp_diff

-0.28923076923077007

In [144]:
mu = 0
z = (xp_diff - 0)/SE
z
pvalue = st.norm.cdf(z) #Gets the probability of the given z score
pvalue = pvalue*2
pvalue

0.02388576073384972

In [None]:
#Since the shaded area is > 0 , we can reject the Ho ( Null Hypothesis)
#The average temperature of Male and Female is different