## Different Standard Deviation Calculation Techniques
Two-pass and one-pass through the data standard deviation calculation methods are compared. Their relative errors are calculated using numpy.std to ge the 'actual' value.

In [1]:
from numpy import loadtxt, sqrt, std

A method to calculate standard deviation with two passes through the data:

\begin{gather*}
\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_{i} \\
\sigma \equiv \left[\frac{1}{n-1}\sum_{i=1}^n(x_{i}-\bar{x})^2\right]^{1/2}
\end{gather*}

In [2]:
def two_pass(data):
    n = len(data)
    
    mean = 0.
    for value in data:
        mean += value
    mean /= n
    
    var = 0.
    for value in data:
        var += (value - mean)**2
    var /= n-1
    
    sigma = sqrt(var)
    return sigma

A method to calculate standard deviation with one pass through the data:
\begin{gather*}
\sigma \equiv \left[\frac{1}{n-1}\left(\sum_{i=1}^nx_i^2-n\bar{x}^2\right)\right]^{1/2}
\end{gather*}

In [3]:
def one_pass(data):
    n = len(data)
    
    mean = 0.
    tmp = 0.
    for value in data:
        mean += value
        tmp += value**2
    mean /= n
    if n*mean**2 > tmp:
        print("Can't take the square root of a negative number")
        return -1
    tmp -= n*mean**2
    
    sigma = sqrt(tmp/(n-1))
    return sigma

In [4]:
def relative_error(actual, approx):
    return abs((actual-approx)/actual)

In [5]:
data = loadtxt('cdata.txt')
actual = std(data, ddof=1)
approx1 = one_pass(data)
approx2 = two_pass(data)
print("Relative Error:")
print("One Pass", relative_error(actual, approx1))
print("Two Pass", relative_error(actual, approx2))

Relative Error:
One Pass 3.74039760295e-09
Two Pass 3.51289497185e-16


Now lets generate a sequence with predetermined variance to further investigate the differences between the one-pass and two-pass methods.

In [6]:
from numpy.random import normal

seq_a = normal(0., 1., 2000)
seq_b = normal(1e7, 1., 2000)

actual_a = std(seq_a, ddof=1)
sig1pa = one_pass(seq_a)
sig2pa = two_pass(seq_a)
rel1pa = relative_error(actual_a, sig1pa)
rel2pa = relative_error(actual_a, sig2pa)

actual_b = std(seq_b, ddof=1)
sig1pb = one_pass(seq_b)
sig2pb = two_pass(seq_b)
rel1pb = relative_error(actual_b, sig1pb)
rel2pb = relative_error(actual_b, sig2pb)

print('Relative Errors:')
print("seq_a, one_pass:", rel1pa)
print("seq_a, two_pass:", rel2pa)
print("seq_b, one_pass:", rel1pb)
print("seq_b, two_pass:", rel2pb)

Relative Errors:
seq_a, one_pass: 4.3770921991e-16
seq_a, two_pass: 2.18854609955e-16
seq_b, one_pass: 0.0261520352892
seq_b, two_pass: 2.15245620699e-16
