## Correlation Tests

Statistical tests used to check if two samples are related

### Pearson's Correlation Coefficient

Assumptions:

-  Observations in each sample are independent and identically distributed
-  Observations on each sample are normally distributed
-  Onservations in each sample have the same variance

Interpretation:

-  Ho: The two samples are independent
-  Ha: There is a dependency between the samples

In [1]:
from scipy.stats import pearsonr

data1 = [173, 165, 173, 183, 178, 188, 180, 183, 163, 178]
data2 = [65, 57, 77, 89, 93, 73, 83, 86, 70, 83]

assert(len(data1) == len(data2))
print(':: PASS')

# Note: p-values are not entirely reliable, but are probabbly reasonable for datasets larger than 500 or so.

corr, p = pearsonr(data1, data2)
print(":: Correlation_Coefficient :", corr, "p-value =", p)

if corr > 0:
    print(":: Positive Correlation")
elif corr < 0:
    print(":: Negative Correlation")
else:
    print(":: No Correlation")

:: PASS
:: Correlation_Coefficient : 0.6282774159243043 p-value = 0.05173590160339759
:: Positive Correlation


### Spearman's Rank Correlation

Tests whether two samples have a monotonic relationship

In [2]:
from scipy.stats import spearmanr

corr, p = spearmanr(data1, data2)
print(":: Correlation_Coefficient :", corr, "p-value =", p)

if corr > 0:
    print(":: Positive Correlation")
elif corr < 0:
    print(":: Negative Correlation")
else:
    print(":: No Correlation")

:: Correlation_Coefficient : 0.5859005966748804 p-value = 0.07509685186617432
:: Positive Correlation


### Kendall's Rank Correlation

In [3]:
from scipy.stats import kendalltau

corr, p = kendalltau(data1, data2)
print(":: Correlation_Coefficient :", corr, "p-value =", p)

if corr > 0:
    print(":: Positive Correlation")
elif corr < 0:
    print(":: Negative Correlation")
else:
    print(":: No Correlation")

:: Correlation_Coefficient : 0.4419799999392736 p-value = 0.0842041111104205
:: Positive Correlation


## Parametric Statistical Hypothesis Tests


### Student's t-test
Tests whether the "mean" if two independent samples are significantly different. 

In [5]:
from scipy.stats import ttest_ind
stat, p = ttest_ind(data1, data2)
print(stat, p)

22.529666137150024 1.221053246953295e-14


### Paired Student's T-Test
T-test on TWO RELATED samples

In [7]:
from scipy.stats import ttest_rel
t_stat, p = ttest_rel(data1, data2)

print(t_stat, p)

35.23581224648163 5.904446246610732e-11


## Analysis of Variance (ANOVA) Test
Performs a 1-way ANOVA. Tests the null hypothesis that two or more groups have the same populstion mesn.
The test is applied to samples from two or more groups, possibly differing sizes.


In [8]:
from scipy.stats import f_oneway
t_stat, p_value = f_oneway(data1, data2)
print(t_stat, p_value)

507.5858562514435 1.2210532469533174e-14
