## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [8]:
import pandas as pd
import numpy as np
import scipy.stats as stats

In [9]:
df = pd.read_csv('data/human_body_temperature.csv')

In [10]:
x = df.iloc[:,0]

In [11]:
stats.normaltest(x)

(2.7038014333192031, 0.2587479863488254)

Answer 1: 
This normal test returns 2-tuple of chi-squared statistics and p-value. Here, the p-value is 0.2587 (approx). Assuming Null hypothesis that x comes from normal distribution, the p-value obtained is greater than .15, so, not in favor of alternate hypothesis.
So, this p-value suggests in favor of null hypothesis.

Answer 2: 
Assume, H0: true population mean is 98.6F; mu = 98.6
H1: true population mean is not 98.6F; mu != 98.6

In [13]:
n = len(x)
if n>=30:
    print"Use z-test"
else:
    print "Use t-test"
#for z-test
mu = 98.6
sample_mean = x.mean(axis =1)        # sample mean of temperature
sample_std = x.std(axis =1)          #sample standard deviation
sigma = sample_std / np.sqrt(n)      #true std error of temperature 
test_stat = (sample_mean-mu)/sigma   #calculate test statistics
p_value = 2*stats.norm.cdf(test_stat)  #calculate p-value from cumulative propability function for two-sided test

if p_value < 0.001:
    print "p-value is ", p_value
    print "This shows very strong evidence for alternate hypothesis, means this rejects null hypothesis."
    print "The true population mean is not 98.6F"
if 0.001< p_value < 0.05 :
    print "p-value is ", p_value
    print "This shows strong evidence for alternate hypothesis, means this rejects null hypothesis."
    print "The true population mean is not 98.6F"
if 0.05< p_value < 0.15 :
    print "p-value is ", p_value
    print "This shows marginal evidence for alternate hypothesis, means this rejects null hypothesis."
    print "The true population mean is not 98.6F"
if 0.15< p_value :
    print "p-value is ", p_value
    print "This shows no strong evidence for alternate hypothesis, means this do not reject null hypothesis."
    print "The true population mean is 98.6F"
    

Use z-test
p-value is  4.9021570146e-08
This shows very strong evidence for alternate hypothesis, means this rejects null hypothesis.
The true population mean is not 98.6F


Answer 3: 
Assume, confidence interval is 95%

In [15]:
alpha = 0.05
z_stat = stats.norm.ppf(1-alpha/2)
MoE = z_stat * sigma
CI_lower = sample_mean - MoE
CI_upper = sample_mean + MoE
print "Someone's temperature would be abnormal if its less than",CI_lower,"F or more than",CI_upper,"F"

Someone's temperature would be abnormal if its less than 98.1231964282 F or more than 98.3752651103 F


Answer 4:
Assume, 
H0: No significant difference between males and females in normal temperature; mu_male-mu_female = 0
H1: significant difference between males and females in normal temperature; mu_male-mu_female != 0

In [16]:
df_male = df[df.gender=='M']                   #subset for male
df_female = df[df.gender=='F']                 #subset for female

n_male = len(df_male)                          #sample size
n_female = len(df_female)

sample_mean_male = df_male.mean(axis=1)        #sample mean for male group
sample_mean_female = df_female.mean(axis=1)    #sample mean for female group

sample_std_male = df_male.std(axis =1)         #std dev for male group
sample_std_female = df_female.std(axis =1)     #std dev for female group

sigma = np.sqrt((sample_std_male**2/n_male)+(sample_std_female**2/n_female))
test_stat_mu = ((sample_mean_male-sample_mean_female)-0)/sigma
p_value = 2* stats.norm.cdf(test_stat) 

if n_male>=30:
    print"Use z-test"
else:
    print "Use t-test"

#for z-test    
if p_value < 0.001:
    print "p-value is ", p_value
    print "This shows very strong evidence for alternate hypothesis, means this rejects null hypothesis."
    print "There is significant difference between males and females in normal temperature"
if 0.001< p_value < 0.05 :
    print "p-value is ", p_value
    print "This shows strong evidence for alternate hypothesis, means this rejects null hypothesis."
    print "There is significant difference between males and females in normal temperature"
if 0.05< p_value < 0.15 :
    print "p-value is ", p_value
    print "This shows marginal evidence for alternate hypothesis, means this rejects null hypothesis."
    print "There is difference between males and females in normal temperature"
if 0.15< p_value :
    print "p-value is ", p_value
    print "This shows no strong evidence for alternate hypothesis, means this do not reject null hypothesis."
    print "No significant difference between males and females in normal temperature"

Use z-test
p-value is  4.9021570146e-08
This shows very strong evidence for alternate hypothesis, means this rejects null hypothesis.
There is significant difference between males and females in normal temperature
