# Statistical Tests for Continuous Dataset

## Flow & Scope

Normality & Equal Variance (Continuous Data):

1. Levene Tests: https://www.statology.org/levenes-test-python/. 
   To determine whether two or more groups have equal variances, where H0 = equal variance assumed. 
   
2. Shapiro Wilk Tests: https://www.geeksforgeeks.org/how-to-perform-a-shapiro-wilk-test-in-python/.
   To determine whether the given sample comes from the normal distribution or not, where H0 = normality assumed.
   
In case of neither test produce statsig, proceed to One Way Anova Test.

3.a. One Way Anova Test: https://www.geeksforgeeks.org/how-to-perform-a-one-way-anova-in-python/.
     To determine whether there exists a statistically significant difference between the mean values of more than one group, where H0 = No Means Difference in Any Group. One way ANOVA is parametric hence the required step 1 & 2 not producing any statsig (read more here: https://libguides.library.kent.edu/spss/onewayanova)

     In case step 3.a. One Way Anova produce statsig, i.e one of group means is different, proceed to Pairwise T-Test.

        3.a.1 Pairwise T-Test: https://www.geeksforgeeks.org/how-to-conduct-a-paired-samples-t-test-in-python/.
          To check whether the mean difference between the TWO sets of observation is equal to zero.
          
     In case step 3.a. doesn't produce statsig result, No Pairwise T-Test is needed (meaning, no difference in means)

In case of one of the test from step 1 & 2 produce statsig, proceed to Kruskall Wallis Test.

3.b. Kruskall-Wallis Test: https://www.geeksforgeeks.org/how-to-perform-a-kruskal-wallis-test-in-python/.
     It is a non-parametric test and an alternative to One-Way Anova, hence the statsig from step 1 & 2 (the data is not assumed to become from a particular distribution).
     
     In case step 3.b. Kruskall-Wallis produce statsig, i.e one of group median is different, proceed to Conover Test.
     
         3.b.1 Conover Test: https://scikit-posthocs.readthedocs.io/en/latest/generated/scikit_posthocs.posthoc_conover/. Post hoc pairwise test for multiple comparisons of mean rank sums (Conover´s test). May be used after Kruskal-Wallis one-way analysis of variance by ranks to do pairwise comparisons.
     
     In case step 3.b. doesn't produce statsig result, No Pairwise Conover Test is needed (meaning, no diffrence in median).

### Step 1-2: Normality & Equal Variance Tests

In [45]:
# example of the data
group1 = [7, 14, 14, 13, 12, 9, 6, 14, 12, 8]
group2 = [15, 17, 13, 15, 15, 13, 9, 12, 10, 8]
group3 = [6, 8, 8, 9, 5, 14, 13, 8, 10, 9]

# test for variance/ levene
import scipy.stats as stats
#Levene's test centered at the median
levene_score = stats.levene(group1, group2, group3, center='median').pvalue

if levene_score < 0.05:
    print("Levene Test p-value at {}: Statsig, Variance not equal".format(levene_score.round(2)))
else:
    print("Levene test p-value at {}: Not Statsig, Variance is equal".format(levene_score.round(2)))

Levene test p-value at 0.84: Not Statsig, Variance is equal


In [46]:
# test for normality
from scipy.stats import shapiro

# conduct the  Shapiro-Wilk Test
data_group = [group1, group2, group3]
pvalue_shapiro = []
for i in range(len(data_group)):
    pvalue_shapiro.append(round(shapiro(data_group[i]).pvalue,2))
    if pvalue_shapiro[i] < 0.05:
        print("Shapiro p-value for data_{} is {}: Statsig, Data Not Normal".format(i, pvalue_shapiro[i]))
    else:
        print("Shapiro p-value for data_{} is {}: Not Statsig, Data Normal".format(i, pvalue_shapiro[i]))

Shapiro p-value for data_0 is 0.09: Not Statsig, Data Normal
Shapiro p-value for data_1 is 0.58: Not Statsig, Data Normal
Shapiro p-value for data_2 is 0.42: Not Statsig, Data Normal


### Step 3a: (If None of Step 1 or 2 is Statsig) Means Difference Tests

In [140]:
# If neither Normality nor Variance Tests is statsig (none of them), proceed to One Way Anova
from scipy.stats import f_oneway

oneway_pvalue = f_oneway(data_grouped[0],data_grouped[1],data_grouped[2]).pvalue
if oneway_pvalue < 0.05:
    print("One Way Anova p-value for grouped data is {}: Statsig, At Least one in Group differs in mean.".format(oneway_pvalue.round(2)))
else:
    print("One Way Anova p-value for grouped data is {}: Not Statsig, None within Group differs in mean.".format(oneway_pvalue.round(2)))

One Way Anova p-value for grouped data is 0.03: Statsig, At Least one in Group differs in mean.


### Step 3a.1: (if Step 3a is Statsig) Pairwise T Tests (parametric, for Means Difference between 2 groups)

In [135]:
# Pairwise t-test, if One Way Anova Result is statsig

# check between group 1 and group 2
pvalue_pairwise_t_group12 = stats.ttest_rel(data_group[0], data_group[1]).pvalue
# check between group 2 and group 3
pvalue_pairwise_t_group23 = stats.ttest_rel(data_group[1], data_group[2]).pvalue
# check between group 1 and group 3
pvalue_pairwise_t_group13 = stats.ttest_rel(data_group[0], data_group[2]).pvalue

if pvalue_pairwise_t_group12 < 0.05:
    print("Pairwise t p-value group 1 vs 2 is {}: Statsig, Mean difference exists".format(pvalue_pairwise_t_group12.round(2)))
else:
    print("Pairwise t p-value group 1 vs 2 is {}: Not Statsig, No Mean difference exists".format(pvalue_pairwise_t_group12.round(2)))
    
if pvalue_pairwise_t_group23 < 0.05:
    print("Pairwise t p-value group 2 vs 3 is {}: Statsig, Mean difference exists".format(pvalue_pairwise_t_group23.round(2)))
else:
    print("Pairwise t p-value group 2 vs 3 is {}: Not Statsig, No Mean difference exists".format(pvalue_pairwise_t_group23.round(2)))    
 
if pvalue_pairwise_t_group13 < 0.05:
    print("Pairwise t p-value group 1 vs 3 is {}: Statsig, Mean difference exists".format(pvalue_pairwise_t_group13.round(2)))
else:
    print("Pairwise t p-value group 1 vs 3 is {}: Not Statsig, No Mean difference exists".format(pvalue_pairwise_t_group13.round(2)))    

Pairwise t p-value group 1 vs 2 is 0.1: Not Statsig, No Mean difference exists
Pairwise t p-value group 2 vs 3 is 0.04: Statsig, Mean difference exists
Pairwise t p-value group 1 vs 3 is 0.25: Not Statsig, No Mean difference exists


### Step 3b: Kruskall-Wallis Test (for median difference across data groups)

In [139]:
# Conduct kruskall-wallis test

pvalue_kruskall = stats.kruskal(data_group[0], data_group[1], data_group[2]).pvalue
if pvalue_kruskall <0.05:
    print("Kruskall-Wallis p-value for grouped data is {}: Statsig, at Least one in Group differs in median.".format(pvalue_kruskall.round(2)))
else:
    print("Kruskall-Wallis p-value for grouped data is {}: Not Statsig, none within Group differs in median.".format(pvalue_kruskall.round(2)))

Kruskall-Wallis p-value for grouped data is 0.04: Statsig, at Least one in Group differs in median.


### Step 3b.1: (If step 3b is statsig) Pairwise Conover Test (non parametric, for rank sum means difference between 2 groups)

In [171]:
# Conduct Pairwise Conover Test
import scikit_posthocs as sp
pvalues_conover = sp.posthoc_conover(data_group, p_adjust = 'fdr_bh')

# comparisons
if pvalues_conover[1][2] < 0.05:
    print("Pairwise Conover p-value group 1 vs 2 is {}: Statsig, Mean Rank difference exists.".format(pvalues_conover[1][2].round(2)))
else:
    print("Pairwise Conover p-value group 1 vs 2 is {}: Not Statsig, No Mean Rank difference exists.".format(pvalues_conover[1][2].round(2)))
    
if pvalues_conover[1][3] < 0.05:
    print("Pairwise Conover p-value group 1 vs 3 is {}: Statsig, Mean Rank difference exists.".format(pvalues_conover[1][3].round(2)))
else:
    print("Pairwise Conover p-value group 1 vs 3 is {}: Not Statsig, No Mean Rank difference exists.".format(pvalues_conover[1][3].round(2)))    
    
if pvalues_conover[2][3] < 0.05:
    print("Pairwise Conover p-value group 2 vs 3 is {}: Statsig, Mean Rank difference exists.".format(pvalues_conover[2][3].round(2)))
else:
    print("Pairwise Conover p-value group 2 vs 3 is {}: Not Statsig, No Mean Rank difference exists.".format(pvalues_conover[2][3].round(2)))        

Pairwise Conover p-value group 1 vs 2 is 0.21: Not Statsig, No Mean Rank difference exists.
Pairwise Conover p-value group 1 vs 3 is 0.21: Not Statsig, No Mean Rank difference exists.
Pairwise Conover p-value group 2 vs 3 is 0.03: Statsig, Mean Rank difference exists.
