# Student's T-test
1. One sample t-test
2. Two samples t-test
    1. un-paired or independent t-test
    2. paired or Relational/Dependant Variable

### One sample T-test

test a sample with a known standard value.

**Assumptions**
- Observation in a sample is independent and identically distributed.
- Observation in a sample is normally distributed.

**Interpretation**
> **Ho :** The means of the sample is equal to the known value.\
> **H1 :** The means of the sample is unequal to the known value.

In [2]:
# One sample t-test

# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_1samp

# load datsets
df=sns.load_dataset("titanic")
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [12]:
# subsetting the data
df1=df[["sex","age","fare"]]
df1.head()

Unnamed: 0,sex,age,fare
0,male,22.0,7.25
1,female,38.0,71.2833
2,female,26.0,7.925
3,female,35.0,53.1
4,male,35.0,8.05


In [13]:
df1.describe()

Unnamed: 0,age,fare
count,714.0,891.0
mean,29.699118,32.204208
std,14.526497,49.693429
min,0.42,0.0
25%,20.125,7.9104
50%,28.0,14.4542
75%,38.0,31.0
max,80.0,512.3292


In [15]:
# let's check the age and compare with a known valye of 44-years
ttest_1samp(df1['fare'],45)
stat, p = ttest_1samp(df1['fare'],45)
print('stat=%.3f, p=%.3f' % (stat, p))

# make a coditional argument for further use
if p > 0.05:
	print('Probably the same Distribution')
else:
	print('Probably different distribution')

stat=-7.686, p=0.000
Probably different distribution


## Two sample t-test 
**Independent student's t-test** (1-contnious and 2 discrete variables)

**Assumptions**
- Observations in each sample are independent and identically distributed.
- Observations in each sample are normally distributed.
- Observations in each sample have the same variance.
  
**Interpretation**

>**H0:** the means of the samples are equal. \
>**H1:** the means of the samples are unequal.

In [21]:
# we will compare the age and fare of male and female passengers

# splitting datasets
df_male=df1.loc[df1['sex']=='male']
df_female=df1.loc[df1['sex']=='female']

#libraries
from scipy.stats import ttest_ind

#  T-test (unpaired two sample  ,  independent t-test)
ttest_ind(df_male["fare"],df_female["fare"])
stat,p_value=ttest_ind(df_male["fare"],df_female["fare"])
print("stat is = ",stat)
print("p_value is = ",p_value)

# make a coditional argument for further use
if p > 0.05:
	print('Probably the same Distribution')
else:
	print('Probably different distribution')

stat is =  -5.529140269385719
p_value is =  4.230867870042998e-08
Probably different distribution


In [20]:
df_male.describe()
df_female.describe()

Unnamed: 0,age,fare
count,261.0,314.0
mean,27.915709,44.479818
std,14.110146,57.997698
min,0.75,6.75
25%,18.0,12.071875
50%,27.0,23.0
75%,37.0,55.0
max,63.0,512.3292



**Paired student's t-test**

Tests whether the means of two paired samples are significantly different.

**Assumptions**

- Observations in each sample are independent and identically distributed.
- Observations in each sample are normally distributed.
- Observations in each sample have the same variance.
- Observations across each sample are paired.
- 
**Interpretation**

>**H0:** the means of the samples are equal. \
>**H1:** the means of the samples are unequal.

In [23]:
# select only male data
df_male=df.loc[df['sex']=='male']
df_male.head()


Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True
5,0,3,male,,0,0,8.4583,Q,Third,man,True,,Queenstown,no,True
6,0,1,male,54.0,0,0,51.8625,S,First,man,True,E,Southampton,no,True
7,0,3,male,2.0,3,1,21.075,S,Third,child,False,,Southampton,no,False


In [25]:
# select only  classes
df_male_first=df_male.loc[df_male['class']=='First']
df_male_second=df_male.loc[df_male['class']=='Second']
df_male_third=df_male.loc[df_male['class']=='Third']

In [26]:
# check our data
df_male_first.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
6,0,1,male,54.0,0,0,51.8625,S,First,man,True,E,Southampton,no,True
23,1,1,male,28.0,0,0,35.5,S,First,man,True,A,Southampton,yes,True
27,0,1,male,19.0,3,2,263.0,S,First,man,True,C,Southampton,no,False
30,0,1,male,40.0,0,0,27.7208,C,First,man,True,,Cherbourg,no,True
34,0,1,male,28.0,1,0,82.1708,C,First,man,True,,Cherbourg,no,False


In [34]:
# import libraries
from scipy.stats import ttest_rel

# First we have to equal the length of columns

df_male_third=df_male_third.sample(100)
df_male_first=df_male_first.sample(100)

# T-test (paired  ttest)
ttest_rel(df_male_first['fare'],df_male_third['fare'])
stat,p_value=ttest_rel(df_male_first['fare'],df_male_third['fare'])
print("stat is = ",stat)
print("p_value is = ",p_value)
# make a coditional argument for further use
if p > 0.05:
	print('Probably the same Distribution')
else:
	print('Probably different distribution')

stat is =  6.92425812611532
p_value is =  4.4410404746043687e-10
Probably different distribution
