# Generators and Iterators

![Iterables and Iterators](./data/img/Iterable.png)

## Building your own generators with `yield`

In [1]:
def counter(start, end):
    current = start
    while current < end:
        yield current
        current += 1

In [2]:
counter(1, 10)

<generator object counter at 0x1047b9a98>

In [3]:
x = counter(1,10)
next(x)

1

In [4]:
next(x)

2

In [5]:
next(x)

3

In [6]:
x = counter(1,10)
list(x)

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [7]:
for item in counter(1, 10):
    print(item, end=' ')

1 2 3 4 5 6 7 8 9 

`yield` can also be used as a expression, along with the `send()` method

In [8]:
def accumulator(start=0):
    current = start
    while True:
        current += yield current

In [9]:
x = accumulator()
next(x)

0

In [10]:
x.send(1)

1

In [11]:
x.send(1)

2

In [12]:
x.send(10)

12

## The iterator protocol

What does `for x in sequence:` *really* do?

In [13]:
seq = range(4)
for x in seq: 
    print(x)

0
1
2
3


In [14]:
iter_seq = iter(seq)
print(iter_seq)

<range_iterator object at 0x104803540>


In [15]:
iter_seq = iter(seq)
try:
    while True:
        x = next(iter_seq)
        print(x)
except StopIteration:
    pass

0
1
2
3


In [16]:
lst = [1,2,3]
next(iter(lst))

1

In [17]:
li = iter([1,2,3])

In [21]:
next(li)

StopIteration: 

Generators are their own iterators:

In [22]:
x = counter(0, 4)
print(x)
print(iter(x))
x is iter(x)

<generator object counter at 0x10487bb10>
<generator object counter at 0x10487bb10>


True

In [23]:
for item in counter(0, 4): 
    print(item)

0
1
2
3


In [24]:
x = counter(0, 4)
while True:
    next(x)

StopIteration: 

We can also define our own iterator classes (though generators are usually more readable):

In [25]:
class Counter(object):
    def __init__(self, start, end):
        self._start = start
        self._end = end
    def __iter__(self):
        '''This is often implemented as a generator function'''
        return CounterIterator(self._start, self._end)
    
class CounterIterator(object):
    def __init__(self, start, end):
        self._cur = start
        self._end = end
    def __next__(self):
        result = self._cur
        self._cur += 1
        if result < self._end:
            return result
        else:
            raise StopIteration

ctr = Counter(0, 5)
print(list(ctr))

[0, 1, 2, 3, 4]


# Set and dict comprehensions

In [28]:
{x for x in range(4)}

{0, 1, 2, 3}

In [29]:
{x:'y' for x in range(4)}

{0: 'y', 1: 'y', 2: 'y', 3: 'y'}

## Generator expressions

In [30]:
[ x for x in range(10) if x % 2 == 0 ]

[0, 2, 4, 6, 8]

In [31]:
( x for x in range(10) if x % 2 == 0 )

<generator object <genexpr> at 0x10487ba98>

In [32]:
gen = ( x for x in range(10) if x % 2 == 0 )

In [33]:
next(gen)

0

In [34]:
next(gen)

2

In [35]:
list(gen)

[4, 6, 8]

## The `itertools` module

`itertools` provides a number of "higher-order iterators" that allow you to combine iterators in interesting ways.

In [36]:
from itertools import chain, count, groupby

In [37]:
# chain links multiple iterators end-to-end
xs = range(10)
ys = 'abcdef'
list(chain(xs, ys))


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 'a', 'b', 'c', 'd', 'e', 'f']

In [38]:
# The Python 3 built-in "zip" lets us iteratively zip multiple iterators. 
#  Useful when building a giant dictionary:
import string
dict(zip(string.ascii_lowercase, string.ascii_uppercase[:10]))

{'a': 'A',
 'b': 'B',
 'c': 'C',
 'd': 'D',
 'e': 'E',
 'f': 'F',
 'g': 'G',
 'h': 'H',
 'i': 'I',
 'j': 'J'}

In [41]:
# count() gives us a simple iterator of consecutive values

for i, letter in zip(count(), string.ascii_letters[:10]):
    print(i, letter)

0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j


In [42]:
for i, letter in enumerate(string.ascii_letters[:10]):
    print(i, letter)

0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j


In [44]:
# Python anti-pattern
for i in range(len(string.ascii_letters[:10])):
    print(i, string.ascii_letters[i])

0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j


In [45]:
# also an anti-pattern
d = dict(zip(string.ascii_lowercase, string.ascii_uppercase[:10]))
for key in d.keys():
    print(key, d[key])

a A
b B
c C
d D
e E
f F
g G
h H
i I
j J


In [46]:
for key, value in d.items():
    print(key, value)

a A
b B
c C
d D
e E
f F
g G
h H
i I
j J


`groupby()` allows us to efficiently group values from an iterator into sub-values. For instance, we might have 
some datetime-based data that we wish to convert to date-based data:

In [47]:
from random import random
from datetime import datetime, timedelta

trades = []
dt = datetime(2016, 4, 24)
while dt < datetime(2016,4,27):
    trades.append((dt, random()))
    dt += timedelta(hours=1)
    
print(len(trades))

72


In [48]:
trades[:10]

[(datetime.datetime(2016, 4, 24, 0, 0), 0.6363691011690938),
 (datetime.datetime(2016, 4, 24, 1, 0), 0.941980976478694),
 (datetime.datetime(2016, 4, 24, 2, 0), 0.6816797774688172),
 (datetime.datetime(2016, 4, 24, 3, 0), 0.3347369245158667),
 (datetime.datetime(2016, 4, 24, 4, 0), 0.6706804431247804),
 (datetime.datetime(2016, 4, 24, 5, 0), 0.1088018258678417),
 (datetime.datetime(2016, 4, 24, 6, 0), 0.23118409602610024),
 (datetime.datetime(2016, 4, 24, 7, 0), 0.6398874603601146),
 (datetime.datetime(2016, 4, 24, 8, 0), 0.8154780111525243),
 (datetime.datetime(2016, 4, 24, 9, 0), 0.9474772640921865)]

In [49]:
def day_of_trade(val):
    dt, value = val
    return dt.date()

for date, date_trades in groupby(trades, key=day_of_trade):
    print(date, len(list(date_trades)))


2016-04-24 24
2016-04-25 24
2016-04-26 24


In [50]:
for date, date_trades in groupby(trades, key=day_of_trade):
    date_trades = list(date_trades)
    print(date, sum(v for dt, v in date_trades) / len(date_trades))


2016-04-24 0.5952232111595388
2016-04-25 0.5515716800929288
2016-04-26 0.5612704896979277


In [51]:
import random
random.shuffle(trades)

for date, date_trades in groupby(trades, key=day_of_trade):
    date_trades = list(date_trades)
    print(date, sum(v for dt, v in date_trades) / len(list(date_trades)))


2016-04-26 0.8618632021800706
2016-04-24 0.03463216890611387
2016-04-26 0.5034362014200177
2016-04-24 0.8217457738231302
2016-04-26 0.6856487853387871
2016-04-24 0.6876206046062708
2016-04-25 0.7290178490309717
2016-04-26 0.6157217254812255
2016-04-25 0.11236940666162587
2016-04-26 0.8114640829487815
2016-04-24 0.13379546765039618
2016-04-26 0.14685095281773164
2016-04-24 0.4525109139307667
2016-04-25 0.6525766183456034
2016-04-26 0.7417756322333598
2016-04-25 0.4156709392751674
2016-04-24 0.9506701760615054
2016-04-26 0.2983440271598172
2016-04-24 0.9346406325356714
2016-04-26 0.3023205259493519
2016-04-25 0.8900980119836835
2016-04-26 0.7348038221661314
2016-04-25 0.5891213694177395
2016-04-26 0.9765010329642905
2016-04-25 0.4734825422922362
2016-04-26 0.9355769113218811
2016-04-25 0.622229169869006
2016-04-26 0.21830438893716025
2016-04-25 0.48414748672297875
2016-04-24 0.3095055882970113
2016-04-25 0.8077164421763458
2016-04-24 0.6454367493968538
2016-04-26 0.5054741916492135
2016-

### Note that your data *must* already be sorted in a "grouped" order if you use `groupby`. If you wish to group *unsorted* data, you should use a `defaultdict` instead.

# Lab

Open [Generators and Iterators Lab][iteration-lab]

[iteration-lab]: ./iteration-lab.ipynb