## Correlation Tests

Statistical tests used to check if two samples are related

### Pearson's Correlation Coefficient

Assumptions:

-  Observations in each sample are independent and identically distributed
-  Observations on each sample are normally distributed
-  Onservations in each sample have the same variance

Interpretation:

-  Ho: The two samples are independent
-  Ha: There is a dependency between the samples

In [12]:
from scipy.stats import pearsonr

data1 = [173, 165, 173, 183, 178, 188, 180, 183, 163, 178]
data2 = [65, 57, 77, 89, 93, 73, 83, 86, 70, 83]

assert(len(data1) == len(data2))
print(':: PASS')

# Note: p-values are not entirely reliable, but are probabbly reasonable for datasets larger than 500 or so.

corr, p = pearsonr(data1, data2)
print(":: Correlation_Coefficient :", corr, "p-value =", p)

if corr > 0:
    print(":: Positive Correlation")
elif corr < 0:
    print(":: Negative Correlation")
else:
    print(":: No Correlation")

:: PASS
:: Correlation_Coefficient : 0.6282774159243043 p-value = 0.05173590160339759
:: Positive Correlation


### Spearman's Rank Correlation

Tests whether two samples have a monotonic relationship

In [13]:
from scipy.stats import spearmanr

corr, p = spearmanr(data1, data2)
print(":: Correlation_Coefficient :", corr, "p-value =", p)

if corr > 0:
    print(":: Positive Correlation")
elif corr < 0:
    print(":: Negative Correlation")
else:
    print(":: No Correlation")

:: Correlation_Coefficient : 0.5859005966748804 p-value = 0.07509685186617432
:: Positive Correlation
