## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

Answer - Done as a group and submitted on repo

## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [3]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [1]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [4]:
g = make_data(5, 10)
list(g)

[999999999.8509412,
 999999993.1897373,
 1000000000.891846,
 999999997.5906372,
 1000000004.8087181,
 999999997.5352579,
 1000000000.3732481,
 999999999.3679739,
 999999997.9066861,
 1000000003.3359729,
 999999997.405954]

In [5]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[999999995.229255,
 999999997.9925121,
 999999998.5620692,
 999999998.9312924,
 999999998.8732634,
 999999999.5785928,
 999999999.2463268,
 999999999.4201928,
 999999999.3208656,
 999999999.4380766,
 999999999.0996964,
 999999999.1656443,
 999999999.2076358,
 999999999.2295027,
 999999999.161528,
 999999999.323338,
 999999999.5162235,
 999999999.5477214,
 999999999.6177036,
 999999999.7507001,
 999999999.7822069,
 999999999.7717952,
 999999999.6949552,
 999999999.6804736,
 999999999.5851457,
 999999999.6292224,
 999999999.6268014,
 999999999.4498057,
 999999999.6313498,
 999999999.7805952,
 999999999.7365594,
 999999999.7344344,
 999999999.9137712,
 999999999.901228,
 999999999.8897069,
 999999999.8663139,
 999999999.7254987,
 999999999.7007502,
 999999999.7341574,
 999999999.8015827,
 999999999.8373306,
 999999999.9612532,
 999999999.9451023,
 999999999.9417627,
 999999999.9469988,
 999999999.9507598,
 999999999.8436308,
 999999999.7651947,
 999999999.7221284,
 999999999.6529275,
 999

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [11]:
# your code here
import math
def online_mean_dev(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        if n == 1:
            dev_accum = 0
        else:
            dev_accum = dev_accum + delta*(value - mu)  
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [47]:
data_with_stats = online_mean_dev(make_data(5, 100000))

In [48]:
list(data_with_stats)

[(100001, 999999998.8868219, 999999999.9983464, 2.8793074239927328)]

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [34]:
#your code here
def is_ok(level, t):
    if ( (t[2]-level*t[3]) < t[1] ) and ( t[1] < (t[2]+level*t[3]) ):
        return True
    else:
        return False

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [44]:
from itertools import filterfalse
pred = lambda t: is_ok(0.5, t) # changed level so I can see some anomalies
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [46]:
list(anomalies)#materialize

[(100001, 999999997.2713226, 999999999.9885623, 2.8855089810983277)]

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).