# Paired T-test
<br><br>
**Goal**: test if the mean of a population in two different moments is the same.
<br><br>
## Example : The new diet

A population was exposed to a new diet, our goal is to figure out if:
<br><br>
H1: The diet had an effect on the weight ( mean_ater_diet != mean_before_diet )<br>
H0: The diet has no effect on the weight ( mean_ater_diet == mean_before_diet )<br>
<br><br>
**Assumptions**
<br><br>
1 - The dependent variable should be measured on a continuous scale<br>
2 - Observations should be independent of each other<br>
3 - There should be no significant outliers<br>
4 - The dependent variable has a normal distribution<br>

In [1]:
import pandas as pd
from scipy import stats

In [20]:
# Import dataset:

df = pd.read_csv("diet_dataset.csv", decimal=',')
df.head()

Unnamed: 0,id_participante,peso_antes,peso_depois
0,id1,74.6,73.1
1,id2,104.0,100.6
2,id3,96.0,92.7
3,id4,96.3,96.4
4,id5,106.2,105.5


In [21]:
df.dtypes

id_participante     object
peso_antes         float64
peso_depois        float64
dtype: object

In [24]:
df.describe()

Unnamed: 0,peso_antes,peso_depois
count,100.0,100.0
mean,96.58,94.473
std,11.13757,11.132525
min,72.0,67.7
25%,86.675,85.7
50%,96.8,95.05
75%,106.05,104.375
max,113.0,111.5


The mean after the diet is more then 2 kilos smaller than the mean before. Let's check if these results have statistical significance:

In [26]:
# weight before diet:
a = df.peso_antes

#weight after diet:
b = df.peso_depois

#Paired sample-test:
tStat, pValue =  stats.ttest_rel(a, b)

print("P-Value:{0} T-Statistic:{1}".format(pValue,tStat))

P-Value:7.122999470178242e-27 T-Statistic:14.820165521870756


*Result: Our p-value is smaller than 0.05, so we reject the null hypothesis* and we confirm that this diet helps to loose weight.
