# Challenge 1 - T-test

In statistics, t-test is used to test if two data samples have a significant difference between their means. There are two types of t-test:

* **Student's t-test** (a.k.a. independent or uncorrelated t-test). This type of t-test is to compare the samples of **two independent populations** (e.g. test scores of students in two different classes). `scipy` provides the [`ttest_ind`](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_ind.html) method to conduct student's t-test.

* **Paired t-test** (a.k.a. dependent or correlated t-test). This type of t-test is to compare the samples of **the same population** (e.g. scores of different tests of students in the same class). `scipy` provides the [`ttest_re`](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_rel.html) method to conduct paired t-test.

Both types of t-tests return a number which is called the **p-value**. If p-value is below 0.05, we can confidently declare the null-hypothesis is rejected and the difference is significant. If p-value is between 0.05 and 0.1, we may also declare the null-hypothesis is rejected but we are not highly confident. If p-value is above 0.1 we do not reject the null-hypothesis.

Read more about the t-test in [this article](http://b.link/test50) and [this Quora](http://b.link/unpaired97). Make sure you understand when to use which type of t-test. 

In [21]:
# Import libraries
import pandas as pd
import numpy as np
import scipy.stats as st
from scipy.stats import ttest_1samp, ttest_rel, ttest_ind

## Steps to follow:

1) Set the hypothesis

2) Choose significance / confidence level

3) Identify your sample (!= population)

4) Compute statistic

5) Get p-value

    - Compute it with a one tailed test, if the H0<= or H0>=
    - Compute it with a two tailed test, if the H0=

6) Decide

    - Reject H0
    - Not reject H0
    
 

#### Import dataset

In this challenge we will work on the Pokemon dataset. The goal is to test whether different groups of pokemon (e.g. Legendary vs Normal, Generation 1 vs 2, single-type vs dual-type) have different stats (e.g. HP, Attack, Defense, etc.).

In [22]:
# 1. Set the hypothesis

## H0: mu is different groups of Pokemons have equal stats (e.g., HP, Attack, Defense)
## H1: mu is different groups of Pokemons have different stats (e.g., HP, Attack, Defense)

## We are coming across a two tailed test

In [23]:
# 2. Significance level

alpha = 0.05

In [24]:
#3. Define sample

pokemon = pd.read_csv('Pokemon.csv')
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


#### First we want to define a function with which we can test the means of a feature set of two samples. 

In the next cell you'll see the annotations of the Python function that explains what this function does and its arguments and returned value. This type of annotation is called **docstring** which is a convention used among Python developers. The docstring convention allows developers to write consistent tech documentations for their codes so that others can read. It also allows some websites to automatically parse the docstrings and display user-friendly documentations.

Follow the specifications of the docstring and complete the function.

In [25]:
# 4. Compute the statistics
# 5. Obtain the p_value

def t_test_features(s1, s2, features=['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Total']):
    """Test means of a feature set of two samples
    
    Args:
        s1 (dataframe): sample 1
        s2 (dataframe): sample 2
        features (list): an array of features to test
    
    Returns:
        dict: a dictionary of t-test scores for each feature where the feature name is the key and the p-value is the value
    """
    results = {}
    
    for feature in features:
        results[feature] = ttest_ind(s1[feature], s2[feature], equal_var=False)
    return results

#### Using the `t_test_features` function, conduct t-test for Lengendary vs non-Legendary pokemons.

*Hint: your output should look like below:*

```
{'HP': 1.0026911708035284e-13,
 'Attack': 2.520372449236646e-16,
 'Defense': 4.8269984949193316e-11,
 'Sp. Atk': 1.5514614112239812e-21,
 'Sp. Def': 2.2949327864052826e-15,
 'Speed': 1.049016311882451e-18,
 'Total': 9.357954335957446e-47}
 ```

In [26]:
# 3. Define your sample: we create two different dataframes, legend and not_legend, to which we will test our hypothesis

legend = pokemon.loc[pokemon['Legendary'] == True]
legend.head(10)

len(legend)

65

In [27]:
not_legend = pokemon.loc[pokemon['Legendary'] == False]
not_legend.head(10)

len(not_legend)

735

In [28]:
stats_pvalue = t_test_features(legend, not_legend, features = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Total'])
stats_pvalue

{'HP': Ttest_indResult(statistic=8.981370483625046, pvalue=1.0026911708035284e-13),
 'Attack': Ttest_indResult(statistic=10.438133539322203, pvalue=2.520372449236646e-16),
 'Defense': Ttest_indResult(statistic=7.637078164784618, pvalue=4.8269984949193316e-11),
 'Sp. Atk': Ttest_indResult(statistic=13.417449984138461, pvalue=1.5514614112239812e-21),
 'Sp. Def': Ttest_indResult(statistic=10.015696613114878, pvalue=2.2949327864052826e-15),
 'Speed': Ttest_indResult(statistic=11.47504444631443, pvalue=1.049016311882451e-18),
 'Total': Ttest_indResult(statistic=25.8335743895517, pvalue=9.357954335957446e-47)}

In [29]:
print(type(stats_pvalue))

<class 'dict'>


In [34]:
# Print the statistics for each feature of Pokemon

stats = {}

for word in keys:
    #We want to fill an empty dictionary, called stats, which only contains the position 0 of our original dictionary
    # 'stats_pvlaue', which corresponds to the statistics of our Pokemon features
    
        stats[word] = stats_pvalue[word][0]

print(stats)

NameError: name 'keys' is not defined

In [35]:
# Print the p_values for each feature of Pokemon

p_values = {}

for word in keys:
    #We want to fill an empty dictionary, called p_values, which only contains the position 1 of our original dictionary
    # 'stats_pvlaue', which corresponds to the p_values of our Pokemon features
    
        p_values[word] = stats_pvalue[word][1]

print(p_values)

NameError: name 'keys' is not defined

#### From the test results above, what conclusion can you make? Do Legendary and non-Legendary pokemons have significantly different stats on each feature?

1. We see that p_value is smaller than alpha
2. Therefore WE DO REJECT H0, BECAUSE THE p_value is smaller than alpha

#### Next, conduct t-test for Generation 1 and Generation 2 pokemons.

In [36]:
# 1. Set the hypothesis

## H0: mu is different groups of Pokemons have equal stats (e.g., HP, Attack, Defense)
## H1: mu is different groups of Pokemons have different stats (e.g., HP, Attack, Defense)

## We are coming across a two tailed test


In [37]:
# 2. Significance level

alpha = 0.05

In [38]:
#3. Define sample

pokemon = pd.read_csv('Pokemon.csv')
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


In [39]:
pokemon.shape

(800, 13)

In [40]:
len(pokemon)

800

In [41]:
pokemon['Generation'].unique()

array([1, 2, 3, 4, 5, 6])

In [42]:
# 3. We create six different dataframes, based on the unique values from the column 'Generation' of the
# original dataframe Pokemon

generation_1 = pokemon.loc[pokemon['Generation'] == 1]
generation_1.head(10)

len(generation_1)

166

In [43]:
generation_2 = pokemon.loc[pokemon['Generation'] == 2]
generation_2.head(10)

len(generation_2)

106

In [44]:
len(generation_1) + len(generation_2)

272

In [45]:
# 4. Compute the statistics
# 5. Obtain the p_value

# We apply the function created to extrapolate only the p-values of our samples generation_1 and generation_2


def t_test2_features(s1, s2, features=['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Total']):
    """Test means of a feature set of two samples
    
    Args:
        s1 (dataframe): sample 1
        s2 (dataframe): sample 2
        features (list): an array of features to test
    
    Returns:
        dict: a dictionary of t-test scores for each feature where the feature name is the key and the p-value is the value
    """
    results = {}
    
    for feature in features:
        results[feature] = ttest_ind(s1[feature], s2[feature], equal_var=False)[1]
    return results

In [46]:
p_values_gen = t_test2_features(generation_1, generation_2, features = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Total'])
p_values_gen

{'HP': 0.14551697834219623,
 'Attack': 0.24721958967217725,
 'Defense': 0.5677711011725426,
 'Sp. Atk': 0.12332165977104394,
 'Sp. Def': 0.18829872292645752,
 'Speed': 0.00239265937312135,
 'Total': 0.5631377907941676}

In [110]:
alpha = 0.05
list_pvalues = [0.14551697834219623, 0.24721958967217725, 0.5677711011725426, 0.12332165977104394, 0.18829872292645752, 0.00239265937312135,
                                         0.5631377907941676]

all(p < alpha for p in list_pvalues)


False

#### What conclusions can you make?

1. We see that all p_values, execpt for Speed (see below), are bigger than alpha
2. Therefore:
- WE DO REJECT H0, for Speed
- WE DO NOT REJECT H0, for the rest of Pokemon features since the p_values are bigger than alpha

In [106]:
0.002393 < alpha

True

#### Compare pokemons who have single type vs those having two types.

In [47]:
pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [48]:
pokemon['Type 1'].unique()

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying'], dtype=object)

In [49]:
pokemon['Type 2'].unique()

array(['Poison', nan, 'Flying', 'Dragon', 'Ground', 'Fairy', 'Grass',
       'Fighting', 'Psychic', 'Steel', 'Ice', 'Rock', 'Dark', 'Water',
       'Electric', 'Fire', 'Ghost', 'Bug', 'Normal'], dtype=object)

#### First findings to capture the right sample:
1) Type 1 has no null values

2) Type 2 has null value (nan)

#### THEREFORE

3) When creating a sample with single Type (from Type 1 and Type 2), I can consider the nan values from Type 2

4) When creating a sample with double Type (from Type 1 and Type 2), I must not consider the nan values from Type 2

In [50]:
#4. Compute the stats 
#5. Get the p_values

single_type = pokemon.loc[pokemon['Type 2'].isna()]
double_type = pokemon.loc[pokemon['Type 2'].notna()]

p_values_type = t_test2_features(single_type, double_type, features = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Total'])
p_values_type

{'HP': 0.11314389855379421,
 'Attack': 0.00014932578145948305,
 'Defense': 2.7978540411514693e-08,
 'Sp. Atk': 0.00013876216585667842,
 'Sp. Def': 0.00010730610934512779,
 'Speed': 0.02421703281819094,
 'Total': 1.1157056505229961e-07}

#### What conclusions can you make?

1. We see that some p_values are smaller and other are bigger than alpha
2. Therefore:
- WE DO REJECT H0, when p_values are smaller than alpha
- WE DO NOT REJECT H0, when p_values are bigger than alpha

#### Now, we want to compare whether there are significant differences of `Attack` vs `Defense`  and  `Sp. Atk` vs `Sp. Def` of all pokemons. Please write your code below.

*Hint: are you comparing different populations or the same population?*

In [51]:
# 1. Set the Hypothesis

# H0: mu attack of a pokemon = mu defense of a pokemon
# H1: mu attack of a pokemon != mu defense of a pokemon

# We also know that it is a two tailed test

In [52]:
# 2. Define the significance level

alpha = 0.05

In [53]:
# 3. Determine the sample

In [54]:
pokemon.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


In [55]:
stat, p_value = ttest_rel(pokemon['Attack'], pokemon['Defense'])

print(stat)
print(p_value)

4.325566393330478
1.7140303479358558e-05


In [56]:
# 6. Decide whether to reject or not reject H0
p_value < alpha

True

#### Comments:

We reject H0

In [57]:
# 1. Set the hypothesis

# H0: mu sp. attack of a pokemon = sp. defense of a pokemon
# H1: mu sp. attack of a pokemon != sp. defense of a pokemon

# We also know that it is a two tailed test

In [58]:
# 2. Define the significance level

alpha = 0.05

In [59]:
# 3. Determine the sample
# 4. Compute the statistics
# 5. Get the p_value

stat, p_value = ttest_rel(pokemon['Sp. Atk'], pokemon['Sp. Def'])

print(stat)
print(p_value)

0.853986188453353
0.3933685997548122


In [60]:
# 6. Decide whether to reject or not reject H0

p_value < alpha

False

#### Comments:

We do not reject H0