## Autocorrelation Implementation with Pandas and Numpy

Python (particularly in the widely used Pandas and Numpy libraries) has a few different implementations of the autocorrelation function: "circular" autocorrelation, "linear" autocorrelation and between those two, some slight difference between functions exists. This Notebook aims to clarify the implementation of each of the different implementations and conclude the one which will be used in my final year project.

In [1]:
import numpy as np
import pandas as pd

In [None]:
(np.correlate([1,2,3], [1,2,3], "full"))

## Numpy correlate
Numpy correlate performs only sum of multiplications at each element. The output of this equation is not normalised hence does not given a good analysis between different timeseries.

\begin{equation*}
\left( r(k, X, Y) = \sum_{n=0}^{N} x_n y_{n+k} \right) where -(N-1) \leq k \leq (N-1)
\end{equation*}

In [3]:
k = pd.Series([1,4,2,1,7])
a = [k.autocorr(n) for n in range(len(k))]

## Pandas Series autocorr
Pandas Series autocorr performs circular autocorrelation on the series. The mean and the variance of r(k) changes with the lag. The output is normalised but the edging effect makes the output difficult to analyse especially towards the end where the function will always tend to 1 or -1.

## Pandas DataFrame Corr
Pandas DataFrame corr performs Pearson's correlation between each of the columns. Essentially it can only be modified to perform the Pandas Series autocorr function.

In [5]:
def linear_autocorrelation(timeseries):
    '''performs linear autocorrelation on timeseries
    variance and mean stays constant
    '''
    
    variance = np.var(timeseries)
    mean = np.mean(timeseries)
    answer = []
    summing = 0
    
    for x in range(0,len(timeseries)):
        temp = np.append(timeseries[x:], timeseries[:x])
        for y in range (0,(len(temp)-x)):
            summing = summing + (temp[y] - mean)*(timeseries[y] - mean)
        
        answer.append((summing/variance)/len(timeseries))
        summing = 0
        
    return answer

## Manual Implementation of Linear Autocorrelation
Linear Autocorrelation with the variance and the mean remain constant as the mean and variance of the entire timeseries thoughout the calculation of r(k) for all values of k