# Sample Size Adjust for Spatially Autocorrelated Data

<h3>Background</h3>

Our goal is to correct a bivariate correlation for the spatial autocorrelation between data points. To understand the problem we will use a simple 1D example. Suppose that we have two datasets $X(s)$ and $Y(s)$ where $s$ is the location of a data point in a one-dimesional space: $s \in \mathbb{R}^{1}$. In our case we will consider $D$, which is a subset of the full space $D\subset\mathbb{R}^{1}$. Our two sets of random data are thus defined as: 

$$\{ X(s), Y(s) : s \in D \}\tag{1}$$

We want to know the spatial relationship between $X$ and $Y$. Perhaps the most straightforward way to assess this is to compute the correlation, $\rho$, between the two variables:

$$\rho = \frac{cov(X,Y)}{\sigma_{X}\sigma_{Y}}\tag{2}$$

where $\sigma$ is the sample standard deviation. We can then assess the significance of $\rho$ using the following formula: 

$$ t = r \sqrt{\frac{n-2}{1-r^{2}}} \tag{3}$$ 

with $n$ equal to the number of data points. This formula assumes that all $n$ data points are statistically independent. Sadly, this is not the case for spatially autocorrelated data. The autocorrelation between the data will result in our effective sample size, $n'$, being less than $n$. For highly spatially autocorrelated data it is possible that $n' << n$. Thus, in order to accurately apply Eq 1. we will need to determine $n'$. 

To do this, we use the approach described by Dale and Fortin, 2009. This method, which was originally described by Clifford et al., 1989, states that the effective sample size can be computed as:

$$ n' = 1 + \frac{n^{2}}{trace(R_{X}R_{Y})} \tag{4}$$

where $R_{X}$ and $R_{Y}$ are the spatial correlation matricies for $X$ and $Y$ respectively. It is clear from Eq 4. that we need a method to determine $R_{X}$ and $R_{Y}$.

<h3>Variograms</h3>

Before discussing the computation of our spatial correlation matricies, we must first introduce the concept of **variogram**. The variogram is a tool for determining the degree of spatial dependence for a random variable (in our case $X$ and $Y$). The classic estimator for a sample variogram is:

$$ 2\hat{\gamma}(h) = \frac{1}{\lvert N(h)\rvert}\sum_{N(h)}{(X(s_{i})-X(s_{j}))^{2}},\quad h\in\mathbb{R}^{1} \tag{5}$$

where $N(h)$ is defined as:

$$ N(h) = \{(s_{i},s_{j}):s_{i}-s_{j}=h\} $$

and $\lvert N(h)\rvert$ is the number of pairs in $N(h)$.

This might look complicated, but the variogram is basically a function that describes the average sum of squares differences between points seperated by a specified distance. Before moving on, it is important to recognize that in practice, we often work with what is called the **semivarogram**, which is simply $\hat{\gamma}(h)$. Although many authors use the terms interchangebly, it is important to recognize the distinction. As Cressie notes:

> some authors have called $\hat{\gamma}(\cdot)$ a variogram. This is a dangerous practice; there is too much to lose from missing 2s. - Cressie, 1993

So what does a variogram look like? To address that, let us generate one using the simple exponential model:

$$ \hat{\gamma}(h) = c_{0} + c_{e}(1-e^{-\lVert h\rVert/a_{e}}) \tag{6}$$


In [1]:
#Load in the libraries we will need
import numpy as np, matplotlib, matplotlib.pyplot as plt

#Make sure plots are show inline
%matplotlib inline

#Define range of distanes
h = np.linspace(0,10,1000)

#Define parameters to use for variogram
cZero = 2.0
cE = 7.5
aE = 0.5

#Calculate variogram given h and parameters
gammaH = cZero + cE*(1-np.exp(-h/aE))

#Make a quick plot
#gammaFig = plt.figure(0)
#gammaPlot = gammaFig.







Reloading __future__
Reloading matplotlib
Reloading matplotlib.externals
Reloading distutils
Reloading distutils.sys
Reloading distutils.version
Reloading distutils.string
Reloading string
Reloading re
Reloading sre_compile
Reloading _sre
Reloading sre_parse
Reloading sre_constants
Reloading _locale
Reloading copy_reg
Reloading types
Reloading strop
Reloading distutils.re
Reloading distutils.types
Reloading itertools
Reloading io
Reloading _io
Reloading abc
Reloading _weakrefset
Reloading _weakref
Reloading inspect
Reloading os
Reloading errno
Reloading posix
Reloading posixpath
Reloading stat
Reloading genericpath
Reloading linecache
Reloading UserDict
Reloading _abcoll
Reloading dis
Reloading opcode
Reloading imp
Reloading tokenize
Reloading token
Reloading operator
Reloading collections
Reloading _collections
Reloading keyword
Reloading heapq
Reloading _heapq
Reloading thread
Reloading locale
Reloading encodings
Reloading encodings.codecs
Reloading codecs
Reloading _codecs
Reloading

ImportError: No module named moves