### Lab | Inferential statistics - T-test & P-value

* #### One tailed t-test - In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file files_for_lab/machine.txt. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other?

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
data = pd.read_csv('files_for_lab/machine.txt',encoding='UTF-16',sep='\t')
data

Unnamed: 0,New machine,Old machine
0,42.1,42.7
1,41.0,43.6
2,41.3,43.8
3,41.8,43.3
4,42.4,42.5
5,42.8,43.5
6,43.2,43.1
7,42.3,41.7
8,41.8,44.0
9,42.7,44.1


In [3]:
# H0: mean of new_machine_packing <= mean of old_machine_packing
# H1:  mean of new_machine_packing > mean of old_machine_packing

In [4]:
data['New machine'].mean()

42.14

In [5]:
data.columns # second column looks weird

Index(['New machine', '    Old machine'], dtype='object')

In [6]:
data=data.rename({'New machine': 'New machine','    Old machine': 'Old machine'},axis=1) # renaming columns properly

In [7]:
data.columns

Index(['New machine', 'Old machine'], dtype='object')

In [8]:
data['Old machine'].mean()

43.230000000000004

In [9]:
st.ttest_ind(a=data['New machine'], b=data['Old machine'], equal_var=True, alternative='less') 
# mean of first sample (new machine) is less than mean of second sample (Old machine)
# that is why we choose alternative ='less'

Ttest_indResult(statistic=-3.3972307061176026, pvalue=0.0016055712503872579)

We reject H0 becuase p-value < 0.05.

+ #### Matched Pairs Test - In this challenge we will compare dependent samples of data describing our Pokemon (file files_for_lab/pokemon.csv). Our goal is to see whether there is a significant difference between each Pokemon's defense and attack scores. Our hypothesis is that the defense and attack scores are equal. Compare the two columns to see if there is a statistically significant difference between them and comment your result.

In [10]:
pokemon=pd.read_csv('files_for_lab/pokemon.csv')
pokemon

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


In [11]:
# H0: mean of defense = mean of scores
# H1:  mean of defense != mean of scores

In [12]:
pokemon['difference'] = pokemon['Attack']-pokemon['Defense'] #calculating the difference
pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,difference
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,0
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,-1
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,-1
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,-23
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,9


In [13]:
sample_diff_mean = pokemon['difference'].mean()
sample_diff_std = pokemon['difference'].std(ddof=1)

In [14]:
sample_diff_mean #The mean of our samples differences

5.15875

In [15]:
sample_diff_std # standard deviation of samples differences

33.7323418553516

In [17]:
t = sample_diff_mean / ( sample_diff_std / np.sqrt(pokemon.shape[0]) ) #t statistics
t

4.325566393330483

In [18]:
tc = st.t.ppf(1-(0.05/2),df= pokemon.shape[0] - 1) #critical value
tc

1.9629374611056056

#### The statistic is 4.33 however the critical value is 1.96.  as 4.3 > 1.96 we reject the H0