# Challenge 1 - T-test

In statistics, t-test is used to test if two data samples have a significant difference between their means. There are two types of t-test:

* **Student's t-test** (a.k.a. independent or uncorrelated t-test). This type of t-test is to compare the samples of **two independent populations** (e.g. test scores of students in two different classes). `scipy` provides the [`ttest_ind`](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_ind.html) method to conduct student's t-test.

* **Paired t-test** (a.k.a. dependent or correlated t-test). This type of t-test is to compare the samples of **the same population** (e.g. scores of different tests of students in the same class). `scipy` provides the [`ttest_re`](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ttest_rel.html) method to conduct paired t-test.

Both types of t-tests return a number which is called the **p-value**. If p-value is below 0.05, we can confidently declare the null-hypothesis is rejected and the difference is significant. If p-value is between 0.05 and 0.1, we may also declare the null-hypothesis is rejected but we are not highly confident. If p-value is above 0.1 we do not reject the null-hypothesis.

Read more about the t-test in [this article](http://b.link/test50) and [this Quora](http://b.link/unpaired97). Make sure you understand when to use which type of t-test. 

In [10]:
# Import libraries
import pandas as pd
import scipy.stats as st
from scipy.stats import ttest_ind
from scipy.stats import ttest_rel


#### Import dataset

In this challenge we will work on the Pokemon dataset. The goal is to test whether different groups of pokemon (e.g. Legendary vs Normal, Generation 1 vs 2, single-type vs dual-type) have different stats (e.g. HP, Attack, Defense, etc.).

In [5]:
# Your code here:
df= pd.read_csv('Pokemon.csv')

df.head()


Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


#### First we want to define a function with which we can test the means of a feature set of two samples. 

In the next cell you'll see the annotations of the Python function that explains what this function does and its arguments and returned value. This type of annotation is called **docstring** which is a convention used among Python developers. The docstring convention allows developers to write consistent tech documentations for their codes so that others can read. It also allows some websites to automatically parse the docstrings and display user-friendly documentations.

Follow the specifications of the docstring and complete the function.

In [13]:
def t_test_features(s1, s2, features=['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Total']):
    """Test means of a feature set of two samples
    
    Args:
        s1 (dataframe): sample 1
        s2 (dataframe): sample 2
        features (list): an array of features to test
    
    Returns:
        dict: a dictionary of t-test scores for each feature where the feature name is the key and the p-value is the value
    """
    results = {}

    # Your code here
    for feature in features:
        t_stat, p_value = ttest_ind(s1[feature], s2[feature])
        results[feature] = p_value

    
    
    return results

#### Using the `t_test_features` function, conduct t-test for Lengendary vs non-Legendary pokemons.

*Hint: your output should look like below:*

```
{'HP': 1.0026911708035284e-13,
 'Attack': 2.520372449236646e-16,
 'Defense': 4.8269984949193316e-11,
 'Sp. Atk': 1.5514614112239812e-21,
 'Sp. Def': 2.2949327864052826e-15,
 'Speed': 1.049016311882451e-18,
 'Total': 9.357954335957446e-47}
 ```

In [14]:
# Your code here
t_test_features(df[df["Legendary"] == True], df[df["Legendary"] == False])

{'HP': 3.330647684846191e-15,
 'Attack': 7.827253003205333e-24,
 'Defense': 1.5842226094427259e-12,
 'Sp. Atk': 6.314915770427266e-41,
 'Sp. Def': 1.8439809580409594e-26,
 'Speed': 2.3540754436898437e-21,
 'Total': 3.0952457469652825e-52}

#### From the test results above, what conclusion can you make? Do Legendary and non-Legendary pokemons have significantly different stats on each feature?

In [None]:
# # Your comment here
# This output indicates the p-values for each feature, showing the statistical significance of the difference in means between Legendary and non-Legendary Pokémon. 
# The very small p-values suggest that there are significant differences between these two groups across all tested features.

#### Next, conduct t-test for Generation 1 and Generation 2 pokemons.

In [18]:
# Your code here
gen_p_value = t_test_features(df[df["Generation"] == 1], df[df["Generation"]== 2])

gen_p_value

{'HP': 0.13791881412813622,
 'Attack': 0.24050968418101457,
 'Defense': 0.5407630349194362,
 'Sp. Atk': 0.14119788176331508,
 'Sp. Def': 0.16781226231606386,
 'Speed': 0.00283569548125787,
 'Total': 0.5599140649014442}

#### What conclusions can you make?

In [17]:
# # Your comment here
# Typically, a p-value of less than 0.05 is considered statistically significant. 
# This means there is less than a 5% chance that the observed difference is due to random variation alone.



In [36]:
for k,v in gen_p_value.items():
    if v >= 0.05:
        print(f"{k},{v} : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean {k} between Generation 1 and Generation 2 Pokémon.")
    else:
        print(f"{k}, {v} : The p-value is less than 0.05, indicating that there is a statistically significant difference in the mean {k} between Generation 1 and Generation 2 Pokémon.")

HP,0.13791881412813622 : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean HP between Generation 1 and Generation 2 Pokémon.
Attack,0.24050968418101457 : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean Attack between Generation 1 and Generation 2 Pokémon.
Defense,0.5407630349194362 : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean Defense between Generation 1 and Generation 2 Pokémon.
Sp. Atk,0.14119788176331508 : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean Sp. Atk between Generation 1 and Generation 2 Pokémon.
Sp. Def,0.16781226231606386 : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean Sp. Def between Generation 1 and Generation 2 Pokémon.
Speed, 0.00283569548125787 : The 

In [23]:
df.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [28]:
df['Type 1'].unique()

array(['Grass', 'Fire', 'Water', 'Bug', 'Normal', 'Poison', 'Electric',
       'Ground', 'Fairy', 'Fighting', 'Psychic', 'Rock', 'Ghost', 'Ice',
       'Dragon', 'Dark', 'Steel', 'Flying'], dtype=object)

In [29]:
df.info()
#check columns for null entries, type 2 has nulls

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      414 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


In [30]:
#check type 2 unique values
df['Type 2'].unique()

array(['Poison', nan, 'Flying', 'Dragon', 'Ground', 'Fairy', 'Grass',
       'Fighting', 'Psychic', 'Steel', 'Ice', 'Rock', 'Dark', 'Water',
       'Electric', 'Fire', 'Ghost', 'Bug', 'Normal'], dtype=object)

#### Compare pokemons who have single type vs those having two types.

In [34]:
# Your code here
one_type = df[df['Type 2'].isna()]
two_types = df[df['Type 2'].notna()]

types_p_value = t_test_features(one_type, two_types)

types_p_value


{'HP': 0.11060643144431842,
 'Attack': 0.00015741395666164396,
 'Defense': 3.250594205757004e-08,
 'Sp. Atk': 0.0001454917404035147,
 'Sp. Def': 0.00010893304795534396,
 'Speed': 0.024051410794037463,
 'Total': 1.1749035008828752e-07}

#### What conclusions can you make?

In [37]:
# Your comment here
for k,v in gen_p_value.items():
    if v >= 0.05:
        print(f"{k},{v} : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean {k} between pokemons with 1 type and  2 types.")
    else:
        print(f"{k}, {v} : The p-value is less than 0.05, indicating that there is a statistically significant difference in the mean {k} between between pokemons with 1 type and  2 types.")

HP,0.13791881412813622 : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean HP between pokemons with 1 type and  2 types.
Attack,0.24050968418101457 : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean Attack between pokemons with 1 type and  2 types.
Defense,0.5407630349194362 : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean Defense between pokemons with 1 type and  2 types.
Sp. Atk,0.14119788176331508 : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean Sp. Atk between pokemons with 1 type and  2 types.
Sp. Def,0.16781226231606386 : The p-value is greater than 0.05, indicating that there is no statistically significant difference in the mean Sp. Def between pokemons with 1 type and  2 types.
Speed, 0.00283569548125787 : The p-value is less than

#### Now, we want to compare whether there are significant differences of `Attack` vs `Defense`  and  `Sp. Atk` vs `Sp. Def` of all pokemons. Please write your code below.

*Hint: are you comparing different populations or the same population?*

In [40]:

# Your code here

print(st.ttest_rel(df['Attack'],df['Defense']))

print(st.ttest_rel(df['Sp. Atk'],df['Sp. Def']))


TtestResult(statistic=4.325566393330478, pvalue=1.7140303479358558e-05, df=799)
TtestResult(statistic=0.853986188453353, pvalue=0.3933685997548122, df=799)


#### What conclusions can you make?

In [None]:
# Your comment here
Attack vs. Defense
T-Statistic: The t-statistic is 4.33, 
which indicates that the difference between the two means (for the first pair of features, likely Attack vs. Defense) is 4.33 standard deviations away from the mean difference expected under the null hypothesis (which assumes no difference).

The p-value is approximately 
1.7 *10 -5 which is a very small number and much less than the common significance level of 0.05. This small p-value suggests that the observed difference between the two features is statistically significant.

Degrees of Freedom (df): The degrees of freedom is 799, which typically equals the number of paired observations minus one.
Conclusion:
Given the small p-value, you can reject the null hypothesis. This means there is strong evidence that the means of the two features (likely Attack and Defense) are significantly different from each other in this dataset.




Sp. Atk vs. Sp. Def
T-Statistic: The t-statistic is 0.85, indicating that the difference between the two means (for the second pair of features, likely Sp. Atk vs. Sp. Def) is only 0.85 standard deviations away from the mean difference expected under the null hypothesis.

P-Value: The p-value is approximately 0.393, which is greater than the significance level of 0.05. This larger p-value suggests that the observed difference between the two features is not statistically significant.

Degrees of Freedom (df): The degrees of freedom is again 799, which is typical for a paired t-test with 800 observations.

Conclusion:
Given the large p-value, you fail to reject the null hypothesis. This means there is no statistically significant evidence to suggest a difference between the means of the two features (likely Sp. Atk and Sp. Def).