In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 144

In [None]:
import expectexception

# Iterators, Generators, and Coroutines

Python's composite data structures can be used in for loops and comprehensions.  Other objects can be used in these too, as long as they support the iterable and iterator interfaces.

One convenient way to do this is by creating a generator.  A generator acts like a function that pauses every time it reaches the `yield` keyword.  Coroutines are related construction that allow the function to pause for input.  While a little abstract, these tools allow for more readable code overall by separating concerns.

## Iterables and Iterators

What's happening inside a for loop?

In [None]:
for i in [1, 2, 3]:
    print i

Python is not actually figuring out the length of the list and then indexing through it.  Instead, it is doing something akin to the following:

In [None]:
it = iter([1, 2, 3])
while True:
    try:
        i = next(it)
    except StopIteration:
        break
    print i

The list is an **iterable**.  That means we can call `iter()` on it and get an **iterator**.

In [None]:
type(it)

This iterator will go through the list *once* and then raise `StopIteration`.  After that, it cannot be reused.

In [None]:
try:
    next(it)
except StopIteration:
    print "Iterator exhausted"

The `iter()` function just calls the `__iter__()` method of its argument.  Therefore, implementing an iterable is as simple as giving a class an `__iter__()` method.

This is a good example of Python's "duck typing".  There's no iterable class or interface to inherit from.  It just checks if there is a `__iter__()` method.  If so, that's good enough to be an iterable.

In [None]:
class Iterable(object):
    
    def __init__(self, n):
        self.n = n
    
    def __iter__(self):
        return iter([self.n] * self.n)

In [None]:
my_iter = Iterable(5)
for i in my_iter:
    print i

Similarly, all it takes to be an iterator is a `next()` method.  (Why not `__next__()`?  This was a mistake, and it's fixed in Python 3.)

In [None]:
class Iterator(object):
    
    def __init__(self, n):
        self.curr = n + 1
    
    def next(self):
        self.curr -= 1
        if self.curr >= 0:
            return self.curr
        else:
            raise StopIteration

This iterator isn't an iterable, so we can use it directly in a for loop.  We can demo it in the same fake for loop we used before.

In [None]:
iterator = Iterator(5)
while True:
    try:
        i = next(iterator)
    except StopIteration:
        break
    print i

To actually use it in a for loop, we'd need an iterable that returns it.  We could make a separte class, but it's easier to just make the iterator be iterable as well.

In [None]:
class IterableIterator(object):
    
    def __init__(self, n):
        self.curr = n + 1
    
    def __iter__(self):
        return self
    
    def next(self):
        self.curr -= 1
        if self.curr >= 0:
            return self.curr
        else:
            raise StopIteration

In [None]:
for i in IterableIterator(5):
    print i

## Generators
Generators are a type of iterator.  Benefits:
1. They are more powerful than just using `map` and `filter` because they allow you to hold state in between processing entries.  They are like `reduce` but much easier to use, which makes them powerful.
1. They allow you to hold data in an "inner" context without needing to resort to creating a `class`.  This can be faster since `self.foo` is actually pretty slow in python.
1. **Gotcha**: the generator is not run until you first call `.next`, which can be a bit counterintuitive ...

Here's a generator that does the same countdown as the `IterableIterator` we defined above.  Notice that, even with some print statements, it uses fewer lines of code.

In [None]:
def Countdown(n):
    print "Counting down ..."
    while n > 0:
        yield n
        n -= 1

c = Countdown(5)
print "Set up Countdown"
for i in c:
    print i

`Countdown` is a generator function.  When it is called, it immediately returns a generator object, but no code is executed.

In [None]:
c = Countdown(3)
c

Generators are iterators, so they have a `.next()` method that allows the execution to proceed to the next `yield` statement.

In [None]:
c.next()

In [None]:
print c.next()
print c.next()

When the generator returns during a `.next()` call, `StopIteration` is raised.  This is used by for loops and list comprehensions to signal the end of the iterator.

In [None]:
%%expect_exception StopIteration

print c.next()

## Generator "pipelines"

In particular, we're going to create this generator

```
source_gen -> and_plus_one_gen -> sum_gen
```

and chain them together.  Note that for each generator input, we can yield none, one, or multiple outputs.

1. **Source:** pushes values using `yield`.
2. **Intermediate Step:** both requests previous values (`.next`) and pushes them using `yield`
3. **Sink:** iterates through previous values using `.next`.

**Question:** why is this better than dealing with a list?

In [None]:
def source_gen(n):
    for i in xrange(n):
        yield i

def and_plus_one_gen(gen):
    for i in gen:
        yield i
        yield i + 1

def sum_gen(gen):
    return sum(i for i in gen)

gen1 = source_gen(10)
gen2 = and_plus_one_gen(gen1)
result = sum_gen(gen2)

print result

## Generator comprehensions

Python supports generator comprehensions in addition to list comprehensions.  They use parentheses `()` instead of brackets `[]`.  While concise, they can only do `map` and `filter`-like things.

In [None]:
(j for j in xrange(10))

Actually, the parentheses are needed only to group the expression together.  If the generator comprehension appears in another set of parentheses, a second set is not needed.

In [None]:
sum(j for j in xrange(10))

Comprehensions can be nested.  This produces a flattened list or generator.  Note that the syntax reads outwards from the middle: The outermost loop appears in the middle, the inner loop appears at the end, and the quantity calculated in that loop appears at the beginning.

In [None]:
[j for i in xrange(10) for j in (i, i+1)]

Using this, we can reproduce the previous generator pipeline.

In [None]:
sum(j for i in xrange(10) for j in (i, i+1))

### Not all generators can be written as generator comprehensions

It might seem from the above example that all generators can be written as generator expressions.  This is not true.  Generator expressions cannot keep track of state in between processing elements, generators can.  In the following example, the `total` variable holds state between generator iterations.

In [None]:
def and_total_gen(gen):
    total = 0
    for i in gen:
        yield i
        total += i
        yield total

In [None]:
list(and_total_gen(xrange(10)))

## Time complexity

Becuase they don't have to construct an entire list, iterators are much faster. Generator comprehensions will be faster than list comprehensions. They are also much more memory efficient (typically `O(1)` rather than `O(n)`).

In [None]:
%%timeit -n1

gen1 = xrange(int(1e7))
gen2 = (j for i in gen1 for j in (i, i+1))
sum(gen2)

In [None]:
%%timeit -n1

list1 = range(int(1e7))
list2 = [j for i in list1 for j in (i, i+1)]
sum(list2)

## Itertools in Python

Manipulating iterators requires a little more care than before.  For example, `range`, `map`, `filter`, all have their iterator equivalents for `xrange`, `imap`, `ifilter`.

In [None]:
from itertools import count, islice, chain, tee, ifilter, takewhile, dropwhile, combinations

Count is an iterator that is never exhausted.  We can slice a portion of it.

In [None]:
list(islice(count(), 10))

In [None]:
list(chain(xrange(10), xrange(10)))

In [None]:
it = xrange(10)
it1, it2 = tee(it, 2)
list(it1)  # Why is this dangerous?

In [None]:
list(it2)

In [None]:
list(it1)

In [None]:
list(ifilter(lambda x: x < 'C', 'ABCDABCD'))

In [None]:
list(takewhile(lambda x: x < 'C', 'ABCDABCD'))

In [None]:
list(dropwhile(lambda x: x < 'C', 'ABCDABCD'))

In [None]:
list(combinations(xrange(4), 2))

In [None]:
from itertools import izip

it = xrange(10)
it1, it2 = tee(it, 2)
it2.next()
list(izip(it1, it2))

### Exercises
1. We've seen how to group an iterator pairwise.  This is useful in a time series for monitoring the "derivative" with respect to time.  How do you do this for general triple-wise, quadruple-wise etc ...?
1. How do you find a powerset?  That is, given an iterator, return all possible subsets?
1. How do you inspect the i-th lookahead value?

## Coroutines

Coroutines are the "dual" of generators.  Generators return data when called with `.next`.  Coroutines take data sent to them via `.send`.

In [None]:
def grep(pattern):
    print "Looking for %s" % pattern
    while True:
        line = yield
        if pattern in line:
            print line

g = grep("Python")
g

But there's a **gotcha**: you need to call `.send(None)` to start the coroutine.  This allows execution to proceed to the first `yield` statement...

In [None]:
g.send(None)  # must be "primed"

...so that when it is called with a real value, Python knows to which variable it should be assigned.

In [None]:
g.send("Python is great!")

In [None]:
g.send("Java is OK")
g.send("particularly Python generators")

No one can remember to "prime" coroutines so let's just write a wrapper to do so `.send(None)`.

In [None]:
def coroutine(func):
    def start(*args,**kwargs):
        cr = func(*args,**kwargs)
        cr.send(None)
        return cr
    return start

# syntactic sugar for grep = coroutine(grep)
@coroutine
def grep(pattern):
    print "Looking for %s" % pattern
    while True:
        line = yield
        if pattern in line:
            print line

g = grep("Python")
g.send("Python is great!")
g.send("particularly Python generators")

Coroutinues also have a `.close()` method, which stops ends their processing.  Coroutines can react to it by catching the `GeneratorExit` signal.

In [None]:
@coroutine
def print_cr():
    try:
        while True:
            x = yield
            print x
    except GeneratorExit:
        print "Done"

x = print_cr()
x.send(1)
x.send(2)
x.close()

Further attempts to send data to the coroutine will result in a `StopIteration` exception.

In [None]:
%%expect_exception StopIteration

x.send(3)

## Coroutine "pipelines"

This is the same pipeline as before, except that instead of "pulling" values from the previous generator via `.next`, it "pushes" values to the next generator via `.send`.

```
source -> and_plus_one_cr -> sum_cr
```

The 3 steps are:

1. **Source:** pushes values using `send`.
2. **Intermediate Step:** both requests values using `yield` and pushes them using `send`
3. **Sink:** pulls values using `yield` and prints them out.

In [None]:
def source_cr(n, cr):
    for i in xrange(n):
        cr.send(i)
    cr.close()

@coroutine
def and_plus_one_cr(cr):
    try:
        while True:
            i = yield
            cr.send(i)
            cr.send(i+1)
    except GeneratorExit:
        cr.close()

@coroutine
def sum_cr():
    total = 0
    try:
        while True:
            total += yield
    except GeneratorExit:
        print total

cr1 = sum_cr()
cr2 = and_plus_one_cr(cr1)
source_cr(10, cr2)

## Broadcasting

With coroutines, we want to broadcast data to multiple sources.  For example, let's say we want to print numbers that are odd and divisible by 5.  Let's write a simple coroutine to do this.  The architecture is as follows

```
source -> broadcast() ---> divisible_cr(5) -> print_cr()
                      \
                        -> divisible_cr(2) -> print_cr()
```

In [None]:
def source(n, cr):
    for i in xrange(n):
        cr.send(i)

@coroutine
def broadcast(*crs):
    while True:
        i = yield
        for cr in crs:
            cr.send(i)

@coroutine
def divisible_cr(n, cr):
    while True:
        i = yield
        if (i % n) == 0:
            cr.send(i)

@coroutine
def print_cr():
    while True:
        print (yield)

source(10,
    broadcast(
        divisible_cr(5, print_cr()),
        divisible_cr(2, print_cr()),
    )
)

"Pushing" data using coroutines allows you to build more complex data pipelines than "pulling" data.

**Exercise:** How would you create this architecture?

```
source -> broadcast() ---> divisible_cr(5) --+--> print_cr()
                      \                     /
                        -> divisible_cr(2) -
```

## Coroutines as classes

For example, they can often replace classes.  It's many fewer lines of code because the constructor and destructor code is grouped together.

In [None]:
import datetime
import numpy as np

class Timer1:
    def __init__(self):
        pass

    def __enter__(self):
        self.t1 = datetime.datetime.now()

    def __exit__(self, exc_type, exc_value, traceback):
        # may also get error handling if an error occured
        self.t2 = datetime.datetime.now()
        print "Seconds elapsed: {}\n".format((self.t2 - self.t1).total_seconds())

with Timer1():
    x = np.arange(1000)
    x + x

In [None]:
from contextlib import contextmanager
import datetime
import numpy as np

@contextmanager
def Timer2():
    t1 = datetime.datetime.now()
    yield
    t2 = datetime.datetime.now()
    print "Seconds elapsed: {}\n".format((t2 - t1).total_seconds())

with Timer2():
    x = np.arange(1000)
    x + x

**Exercise:** Implement the decorator `contextmanager` using function decorators, a `class` that implements `__enter__` and `__exit__` and coroutines.

## Unifying generators and coroutines

As we've seen above, coroutines are implemented as generators in Python.  This means that they can return values when they receive a send call.  A simple example:

In [None]:
def send_and_get():
    x = yield "OK"
    print "(In coroutine) Got a value; sending it back."
    y = yield x
    print "(In coroutine) Got another value (%s), but I'm done." % y

cr = send_and_get()
cr.send(None)  # Could also use cr.next() to prime

Execution is now paused at the `yield` statement on line 2.  It is waiting for the next call of `.send()`, the argument of which will be assigned to `x`.

In [None]:
cr.send(42)

The `yield` on line 4 has returned the value we passed in.  Now it waits for another value to be assigned to `y`.

In [None]:
%%expect_exception StopIteration

print cr.send(0)

Because execution ended without reaching another `yield` statement, `StopIteration` is raised.

We can use this to compute a running mean and standard devation of values sent into a coroutine.

In [None]:
import math

@coroutine
def stats_cr():
    m0 = 0
    m1 = 0.
    m2 = 0.
    while True:
        if m0 > 0:
            x = yield (m1 / m0), math.sqrt(m2 / m0 - (m1 / m0) * (m1 / m0))
        else:
            x = yield None, None  # What is the purpose of this branch?
        m0 += 1
        m1 += x
        m2 += x * x

scr=stats_cr()
print scr.send(1)
print scr.send(2)
print scr.send(3)

*Copyright &copy; 2015 The Data Incubator.  All rights reserved.*