## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [16]:
#your code here
import numpy as np
from lazy import *

class TimeSeries():
    '''
    """
Help on package TimeSeries:

NAME
    TimeSeries

DESCRIPTION
    TimeSeries
    =====
    
    Provides
      1. An sequence or any iterable objects
    
    How to use the documentation
    ----------------------------
    Documentation is available in two forms: docstrings provided
    with the code, and a loose standing reference guide, available from
    `the TimeSeries homepage <https://github.com/cs207-project>`_.
    
    We recommend exploring the docstrings using
    `IPython <http://ipython.scipy.org>`_, an advanced Python shell with
    TAB-completion and introspection capabilities.  See below for further
    instructions.
    
    The docstring examples assume that `numpy` has been imported as `np`::  
      
    
    
     |  Methods inherited from builtins.RuntimeWarning:
     |  
     |  __init__(self, *args, **kwargs)
     |      Initialize self.  See help(type(self)) for accurate signature.
     |      Stors a TimeSeries in self.TimeSeries_
     |    
     |  __repr__(self, /)
     |      Return a printable sequence shown in python list format containing all values in [self].
     |  
     |  __str__(self, /)
     |      Return a printable abbreviated sequence of maximum first 100 entrees.
     |  
     |  __getitem__(self, index)
     |      Return self[index]
     |
     |  __setitem__(self, index, values)
     |      Set self[index] = values
     |
     |  __len__(self)
     |      Return len(self.TimeSeries_)
     '''
    def __init__(self, times, values):
        if (iter(times) and iter(values)):
            # reorder according to Time step
            idx = np.argsort(times)
            times = np.array(times)[idx]
            values = np.array(values)[idx]

            self._TimeSeries=np.vstack((times,values))
            self._vindex = 0
            self._values = self._TimeSeries[1]
            self._times = self._TimeSeries[0]
    
    @property
    @lazy
    def lazy(self):
        return self

    def itervalues(self):
        for v in self._values:
            yield v

    def itertimes(self):
        for t in self._times:
            yield t

    def iteritems(self):
        for t,v in zip(self._times,self._values):
            yield (t,v)
            
    def __len__(self):
        return len(self._TimeSeries[0])
    
    def __contains__(self, time):
        index = np.where(self._TimeSeries[0]==time)
        return index[0].size>0
            
    
    def __getitem__(self,time):
        if (time in self):
            index = np.where(self._TimeSeries[0]==time)
            return self._TimeSeries[1][index]
        else:
            print ("no time point at t={0}".format(time))

    def __setitem__(self,time,value):
        if (time in self):
            index = np.where(self._TimeSeries[0]==time)
            self._TimeSeries[1][index]=value
        else:
            print ("no time point at t={0}".format(time))
            
    def __iter__(self):
        return iter(self._TimeSeries[1])
    
    def __repr__(self):
        return "%r"%(self._TimeSeries)
    
    def __str__(self):
        className = type(self).__name__
        if len(self)>100:
            return "%s" %('['+(str(self._values[:99]))[1:-1]+'...'+']')
        else:
            return "%s" %(self._TimeSeries)
        
    def __eq__(self, other):
        return np.array_equal(self._TimeSeries, other._TimeSeries)
        
    def values(self):
        return self._values
    
    def times(self):
        return self._times
    
    def mean(self):        
        if(len(self._values) == 0):
            raise ValueError("cant calculate mean of length 0 list")
        return np.mean(self._values)
    
    def median(self):
        if(len(self._values) == 0):
            raise ValueError("cant calculate median of length 0 list")
        return np.median(self._values)
    
    def interpolate(self, times):
        new_values = []
        for time in times:
            if time > self._times[-1]: # over the rightest boundary
                new_values.append(self._values[-1])
            elif time < self._times[0]: # over the leftest boundary
                new_values.append(self._values[0])
            elif time in self._times:
                new_values.append(self.__getitem__(time))
            else : #within boundary
                for i in range(len(self._times)):
                    if self._times[i] > time:
                        left_value = self._values[i-1]
                        right_value = self._values[i]
                        left_time = self._times[i-1]
                        right_time = self._times[i]
                        #interpolate
                        new_values.append(left_value + (right_value - left_value)/(right_time - left_time)*(time - left_time))
                        break
        return TimeSeries(times, new_values)


## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [1]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [2]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [3]:
g = make_data(5, 10)
list(g)

[1000000001.4774712,
 1000000001.2149162,
 1000000000.5447836,
 1000000000.4100586,
 1000000000.0837525,
 999999999.2347927,
 999999995.4062667,
 999999999.7298961,
 1000000001.8293751,
 999999994.3945838,
 999999993.9163105]

In [4]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[999999998.6308818,
 999999999.4540175,
 999999999.9906163,
 999999999.985376,
 999999999.6555053,
 999999999.8821379,
 999999999.7306238,
 1000000000.527252,
 1000000001.1107097,
 1000000001.2395592,
 1000000000.9945687,
 1000000001.0801069,
 1000000001.0945207,
 1000000001.6069784,
 1000000001.4884291,
 1000000001.4962935,
 1000000001.6420864,
 1000000001.6398696,
 1000000001.5421716,
 1000000001.5738413,
 1000000001.4957771,
 1000000001.4660469,
 1000000001.348877,
 1000000001.2906228,
 1000000001.106491,
 1000000001.026614,
 1000000001.2433251,
 1000000001.1315995,
 1000000001.2508212,
 1000000001.2094802,
 1000000000.9677453,
 1000000000.8895792,
 1000000000.8542398,
 1000000000.8654195,
 1000000001.0498242,
 1000000001.0075009,
 1000000001.0154817,
 1000000000.9989749,
 1000000001.099108,
 1000000001.0933347,
 1000000001.0643882,
 1000000001.086211,
 1000000001.0598336,
 1000000001.0424213,
 1000000001.034812,
 1000000000.9788917,
 1000000000.9399306,
 1000000000.9295285,
 100000

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [11]:
import math
def online_mean_dev(iterator):
    n = 0
    mu = 0
    dev_accum = 0
    for value in iterator:
        n+=1
        mu_0 = mu
        #generate new mu
        delta = value - mu
        mu = mu + delta/n
        if n > 1:
            dev_accum += (value - mu_0) * (value - mu)
            stddev = math.sqrt(dev_accum/(n-1))
            yield (n, value, mu, stddev)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [12]:
data_with_stats = online_mean_dev(make_data(5, 100000))

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [13]:
#your code here
def is_ok(level, t):
    std = t[3]
    mu = t[2]
    value = t[1]
    dist = level - std
    if (value <= mu + dist) and (value >= mu - dist):
        return True
    else:
        return False

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [14]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [15]:
list(anomalies)#materialize

[(12, 1000000003.7456372, 1000000000.2119246, 1.72335131398136),
 (13, 999999995.900513, 999999999.8802775, 2.0377221664561334),
 (14, 999999996.5474771, 999999999.6422204, 2.150883558043455),
 (17, 1000000003.1714565, 999999999.7903205, 2.1592540736135097),
 (19, 1000000003.3616731, 1000000000.0352895, 2.2033489881034245),
 (37, 999999996.33091, 999999999.7177418, 1.8382416255049452),
 (41, 999999996.1204467, 999999999.5302013, 1.8898400672371212),
 (53, 999999995.9941646, 999999999.5012776, 1.7893141587975376),
 (56, 1000000007.6195394, 999999999.5809944, 2.0837764164343304),
 (58, 999999993.6270521, 999999999.4502264, 2.2002125211920482),
 (65, 1000000003.4540386, 999999999.6017932, 2.1611767836203186),
 (70, 999999995.6874162, 999999999.5965568, 2.145708041096774),
 (71, 1000000005.4471998, 999999999.6789602, 2.2406257346661103),
 (73, 1000000002.8513453, 999999999.7549461, 2.2568090044907385),
 (74, 999999992.4313902, 999999999.6559792, 2.3975419410596723),
 (77, 999999996.6896394

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).