This notebooks shows how to get the correlation between 2 vectors in Python 2.X without the use of Numpy or Pandas

# Correlation

Correlated variables are those which contain information about each other. <br>
The stronger the correlation, the more one variable tells us about the other.

Pearson’s Correlation Coefficient, it is defined as “the covariance between two vectors, normalized by the product of their standard deviations”.

### Imports

In [2]:
import math
import statistics

### Auxiliary Functions

In [3]:
def mean(x):
    return sum(x)/len(x)

In [4]:
def covariance(x,y):
    calc = []
    for i in range(len(x)):
        xi = x[i] - mean(x)
        yi = y[i] - mean(y)
        calc.append(xi * yi)
    return sum(calc)/(len(x) - 1)

- If they are both above their mean (or both below), then this will produce a positive number, because a positive×positive=positive, and likewise a negative×negative=positive.
- If they are on different sides of their means, then this produces a negative number (because positive×negative=negative).

Once we have all these values calculated for each pair, sum them up, and divide by n-1, where n is the sample size. This is the sample covariance.

In [5]:
def correlation(x,y):
    cov = covariance(x,y)
    if((statistics.stdev(x) == 0) and (statistics.stdev(y) == 0)):
        return 1.0 # If both are 0, then they are correlated
    elif((statistics.stdev(x) * statistics.stdev(y)) == 0):
        return null # This should return inf --> Need to handle this
    else:
        return cov / (statistics.stdev(x) * statistics.stdev(y))

The covariance of two identical vectors is also equal to their variance. <br>
Therefore, the maximum value the covariance between two vectors can take is equal to the product of their standard deviations, which occurs when the vectors are perfectly correlated. It is this which bounds the correlation coefficient between -1 and +1.

### Examples

In [6]:
x = [1, 2, 3, 4, 5]
corr = [2, 4, 6, 8, 10]
uncorr = [5, 6, 5, 6, 5]

print correlation(x,corr)
print correlation(x,uncorr)

1.0
0.0


In [7]:
a = [1, 2, 3, 4, 5]
b = [10, 9, 2.5, 6, 4]

print correlation(a,b)

-0.742610657233
