# Iterables, Iterators, and Generators

-  The yield keyword allows the construction of generators, which work as iterators.
- Every generator is an iterator: generators fully implement the iterator interface.

Every collection in Python is iterable, and iterators are used internally to support:
- for loops
- collection types construction and extension
- looping over text files line by line
- list, dict, and set comprehensions
- tuple unpacking
- unpacking actual parameters with the * in function calls

## Sentence Take #1: A Sequence of Words


In [1]:
import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:
    
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
        
    def __getitem__(self, index):
        return self.words[index]
    
    def __len__(self):
        return len(self.words)
    
    def __repr__(self) -> str:
        return 'Sentence(%s)' % reprlib.repr(self.text) 

1. re.findall returns a list with all non-overlapping matches of the regular expression, as a list of strings
2. self.words holds the result of .findall, so we simple return the word at the given index
3. to complete the sequence protocol, we implement `__len__`, (not needed to make an iterable object)
4. reprlib.repr is a utility function to generate abbreviated string representations of data structure that can be very large

In [2]:
s = Sentence('"The time has come," the Walrus said,')
s

Sentence('"The time ha... Walrus said,')

In [6]:
for word in s:
    print(word)

The
time
has
come
the
Walrus
said


In [7]:
list(s)

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

### Why Sequences Are Iterable: The iter Function

Whenever the interpreter needs to iterate over an object x, it automatically calls `iter(x)`.

The iter built-in function:
1. Checks whether the object implements `__iter__`, and calls that to obtain an iterator.
2. If `__iter__` is not implemented, but `__getitem__` is implemented, Python creates an iterator that attempts to fetch items in order, starting from index 0 (zero).
3. If that fails, Python raises TypeError, usually saying “C object is not iterable,” where C is the class of the target object

## Iterables Versus Iterators

- iterable
    - Any object from which the iter built-in function can obtain an iterator. Objects implementing an `__iter__` method returning an iterator are iterable. Sequences are always iterable; as are objects implementing a `__getitem__` method that takes 0-based indexes

In [8]:
s = 'ABC'
for char in s:
    print(char)

A
B
C


In [9]:
it = iter(s)
while True:
    try:
        print(next(it))
    except StopIteration:
        del it
        break

A
B
C


1. build an iterator it from the iterable
2. repeatedly call next on the iterator to obtain the next item
3. The iterator raises StopIteration when there are no further items
4. Release reference to it- the iterator object is discarded
5. exit the loop

The standard interface for an iterator has two methods:

- `__next__`
  - returns the next available item, raises StopIteration when there are no more items.
- `__iter__`
  - returns self, this allows iterators to be used where an iterable is expected

In [10]:
s3 = Sentence('Pig and Pepper')
it = iter(s3)
it

<iterator at 0x2a47ac009a0>

In [11]:
next(it) 

'Pig'

In [13]:
list(it)

['and', 'Pepper']

In [14]:
list(iter(s3))

['Pig', 'and', 'Pepper']

1. Obtain an iterator from s3
2. next(it) fetches the next word
3. iterator loses last value, so list isn't complete
4. To go over the sentence again, a new iterator must be built.

- no way to check whether there are remaining items, other than to call next() and catch StopInteration
- not possible to “reset” an iterator. If you need to start over



- `iterator`
  - Any object that implements the `__next__` no-argument method that returns the next item in a series or raises StopIteration when there are no more items. Python iterators also implement the `__iter__` method so they are iterable as well. 

## Sentence Take #2: A Classic Iterator

In [15]:
import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:
    
    def __init__(self, text) -> None:
        self.text = text
        self.words = RE_WORD.findall(text)
        
    def __repr__(self) -> str:
        return 'Sentence(%s)' % reprlib.repr(self.text)
    
    def __iter__(self):
        return SentenceIterator(self.words)
    
    
class SentenceIterator:
    
    def __init__(self, words) -> None:
        self.words = words
        self.index = 0
        
    def __next__(self):
        try:
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index += 1
        return word
    
    def __iter__(self):
        return self

1. The `__iter__` method is the only addition to the previous Sentence implementation. This version has no `__getitem__`, to make it clear that the class is iterable because it implements `__iter__`.
2. `__iter__` fulfills the iterable protocol by instantiating and returning an iterator
3. `SentenceIterator` holds a reference to the list of words
4. `self.index` is used to determine the next word to fetch.
5. If there is no word at self.index, raise `StopIteration`
6. Increment self.index, Return the word., Implement self.`__iter__`

### Making Sentence an Iterator: Bad Idea

- A common cause of errors in building iterables and iterators is to confuse the two
- iterables have an `__iter__` method that instantiates a new iterator every time
- Iterators implement a `__next__` method that returns individual items, and an `__iter__` method that returns self

To “support multiple traversals” it m`ust be possible to obtain multiple independent iterators from the same iterable instance, and each iterator must keep its own internal state`, so a proper implementation of the pattern requires each call to iter(my_iterable) to create a new,independent, iterator. That is why we need the SentenceItera tor class in this example.

## Sentence Take #3: A Generator Function

A Pythonic implementation of the same functionality uses a generator function to replace the SequenceIterator class

In [16]:
import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:
    
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
        
    def __repr__(self) -> str:
        return 'Sentence(%s)' % reprlib.repr(self.text)
    
    def __iter__(self):
        for word in self.words:
            yield word
        return

1. Iterate of self.words
2. Yield the current word
3. The return is not needed. a generator function doesn’t raise StopIteration: it simply exits when it’s done producing values.
4. No need for a separate iterator class!



### How a Generator Function Works

Any Python function that has the `yield` keyword in its body is a generator function: a function which, when called, returns a `generator object`.

- `Generators are iterators` that produce the values of the expressions passed to yield.
  
When we invoke next(…) on the generator object, execution advances to the next yield in the function body, and the next(…) call evaluates to the value yielded when the function body is suspended.

## Sentence Take #4: A Lazy Implementation


Our Sentence implementations so far have not been lazy because the `__init__` eagerly builds a list of all words in the text, binding it to the self.words attribute. This will entail processing the entire text, and the list may use as much memory as the text itself

- The re.finditer function is a lazy version of re.findall which, instead of a list, returns a generator producing re.MatchObject instances on demand.
- If there are many matches, re.finditer saves a lot of memory.


In [18]:
import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:
    
    def __init__(self, text) -> None:
        self.text = text
        
    def __repr__(self) -> str:
        return 'Sentence(%s)' % reprlib.repr(self.text)
    
    def __iter__(self):
        for match in RE_WORD.finditer(self.text):
            yield match.group()

1. No need to have word list
2. finditer builds an iterator over the matches of RE_WORD on self.text, yielding MatchObject instances.
3. match.group() extracts the actual matched text from the MatchObject instance.

## Sentence Take #5: A Generator Expression

- A generator expression can be understood as a lazy version of a list comprehension

In [19]:
import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:
    
    def __init__(self, text) -> None:
        self.text = text
        
    def __repr__(self) -> str:
        return 'Sentence(%s)' % reprlib.repr(self.text)
    
    def __iter__(self):
        return (match.group() for match in RE_WORD.finditer(self.text))

## Generator Expressions: When to Use Them

- For the simpler cases, a generator expression will do, and it’s easier to read at a glance
- if the generator expression spans more than a couple of lines, code a generator function for the sake of readability

## Another Example: Arithmetic Progression Generator

- The classic Iterator pattern is all about traversal: navigating some data structure.
-  But astandard interface based on a method to fetch the next item in a series is also useful when the items are produced on the fly, instead of retrieved from a collection.

In [20]:
class ArithmeticProgression:
    
    def __init__(self, begin, step, end=None):
        self.begin = begin
        self.step = step
        self.end = end
        
    def __iter__(self):
        result = type(self.begin + self.step)(self.begin) # 2
        forever = self.end is None
        index = 0
        while forever or result < self.end:
            yield result
            index += 1
            result = self.begin + self.step * index # 5

In [21]:
ap = ArithmeticProgression(0, 1, 3)
list(ap)

[0, 1, 2]

1. `__init__` requires two arguments: begin and step. end is optional, if it’s None, the series will be unbounded.
2. This line produces a result value equal to self.begin, but coerced to the type of the subsequent additions.
3. For readability, the forever flag will be True if the self.end attribute is None, resulting in an unbounded series
4. This loop runs forever or until the result matches or exceeds self.end. When this loop exits, so does the function.
5. The next potential result is calculated. It may never be yielded, because the while loop may terminate.

### Arithmetic Progression with itertools

The itertools module in Python 3.4 has `19 generator function`s that can be combined in a variety of interesting ways
- For example, the itertools.count function returns a generator that produces numbers

In [22]:
import itertools

gen =  itertools.count(1, .5)

In [23]:
next(gen)

1

In [24]:
next(gen)

1.5

In [25]:
gen = itertools.takewhile(lambda n: n < 3, itertools.count(1, .5))
list(gen)

[1, 1.5, 2.0, 2.5]

## Generator Functions in the Standard Library

The standard library provides many generators.

|Module| Function| Description|
|-|-|-|
|itertools|compress(it, selector_it)| Consumes two iterables in parallel; yields items from it whenever the corresponding item in selector_it is truthy|
|itertools|dropwhile(predicate, it)|Consumes it skipping items while predicate computes truthy, then yields every remaining item (no further checks are made)|
|(built-in)|filter(predicate,it)|Applies predicate to each item of iterable, yielding the item if predicate(item) is truthy; if predicate is None, only truthy items are yielded|
|itertools|filterfalse(predicate, it)|Same as filter, with the predicate logic negated: yields items whenever predicate computes falsy|
|itertools| islice(it,start, stop,step=1)|Yields items from a slice of it, similar to s[:stop] ors[start:stop:step] except it can be any iterable, and the operation is lazy|
|itertools|takewhile(predicate, it)|Yields items while predicate computes truthy, then stops and no further checks are made|

In [26]:
def vowel(c):
    return c.lower() in 'aeiou'

In [27]:
list(filter(vowel, 'Aardvark'))

['A', 'a', 'a']

In [28]:
list(itertools.filterfalse(vowel, 'Aardvark'))

['r', 'd', 'v', 'r', 'k']

In [29]:
list(itertools.dropwhile(vowel, 'Aardvark'))

['r', 'd', 'v', 'a', 'r', 'k']

In [30]:
list(itertools.takewhile(vowel, 'Aardvark'))

['A', 'a']

In [31]:
list(itertools.compress('Aardvark', (1,0,1,1,0,1)))

['A', 'r', 'd', 'a']

In [32]:
list(itertools.islice('Aardvark', 4))

['A', 'a', 'r', 'd']

In [33]:
list(itertools.islice('Aardvark', 4, 7))

['v', 'a', 'r']

In [34]:
list(itertools.islice('Aardvark', 1, 7, 2))

['a', 'd', 'a']

|Module| Function| Description|
|-|-|-|
|itertools|accumulate(it,[func])|Yields accumulated sums; if func is provided,yields the result of applying it to the first pair of items, then to the first result and next item, etc.|
|(built-in)|enumerate(iterable, start=0)|Yields 2-tuples of the form (index, item), where index is counted from start, and item is taken from the iterable|
|(built-in)|map(func, it1,[it2, …, itN])|Applies func to each item of it,yielding the result; if N iterables are given, func must take N arguments and the iterables will be consumed in parallel|
|itertools|starmap(func, it)|Applies func to each item of it,yielding the result; the input iterable should yield iterable items iit, and func is applied as func(*iit)|

In [35]:
sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]

In [36]:
list(itertools.accumulate(sample))

[5, 9, 11, 19, 26, 32, 35, 35, 44, 45]

In [37]:
list(itertools.accumulate(sample, min))

[5, 4, 2, 2, 2, 2, 2, 0, 0, 0]

In [38]:
list(itertools.accumulate(sample, max))

[5, 5, 5, 8, 8, 8, 8, 8, 9, 9]

In [40]:
import operator
list(itertools.accumulate(sample, operator.mul))

[5, 20, 40, 320, 2240, 13440, 40320, 0, 0, 0]

In [41]:
list(itertools.accumulate(range(1, 11), operator.mul))

[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

In [42]:
list(enumerate('albatroz', 1))

[(1, 'a'),
 (2, 'l'),
 (3, 'b'),
 (4, 'a'),
 (5, 't'),
 (6, 'r'),
 (7, 'o'),
 (8, 'z')]

In [43]:
list(map(operator.mul, range(11), range(11)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [44]:
list(map(operator.mul, range(11), [2, 4, 8]))

[0, 4, 16]

In [45]:
list(map(lambda a, b: (a, b), range(11), [2, 4, 8])) 

[(0, 2), (1, 4), (2, 8)]

In [46]:
list(itertools.starmap(operator.mul, enumerate('albatroz', 1)))

['a', 'll', 'bbb', 'aaaa', 'ttttt', 'rrrrrr', 'ooooooo', 'zzzzzzzz']

In [47]:
sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]
list(itertools.starmap(lambda a, b: b/a,
                       enumerate(itertools.accumulate(sample), 1)))

[5.0,
 4.5,
 3.6666666666666665,
 4.75,
 5.2,
 5.333333333333333,
 5.0,
 4.375,
 4.888888888888889,
 4.5]

|Module| Function| Description|
|-|-|-|
|itertools|chain(it1, …, itN)|Yield all items from it1, then from it2 etc., seamlessly|
|itertools|chain.from_iterable(it)|Yield all items from each iterable produced by it, one after the other, seamlessly; it should yield iterable items, for example, a list of iterables|
|itertools|product(it1, …, itN, repeat=1)|Cartesian product: yields N-tuples made by combining items from each input iterable like nested for loops could produce; repeat allows the input iterables to be consumed more than once|
|(built-in)|zip(it1, …, itN)|YieldsN-tuples built from items taken from the iterables in parallel,silently stopping when the first iterable is exhausted|
|itertools|zip_longest(it1, …,itN, fillvalue=None)|Yields N-tuples built from items taken from the iterables in parallel, stopping only when the last iterable is exhausted, filling the blanks with the fillvalue|

In [48]:
list(itertools.chain('ABC', range(2)))

['A', 'B', 'C', 0, 1]

In [49]:
list(itertools.chain(enumerate('ABC')))

[(0, 'A'), (1, 'B'), (2, 'C')]

In [50]:
list(itertools.chain.from_iterable(enumerate('ABC')))

[0, 'A', 1, 'B', 2, 'C']

In [51]:
list(zip('ABC', range(5)))

[('A', 0), ('B', 1), ('C', 2)]

In [52]:
list(zip('ABC', range(5), [10, 20, 30, 40]))

[('A', 0, 10), ('B', 1, 20), ('C', 2, 30)]

In [53]:
list(itertools.zip_longest('ABC', range(5)))

[('A', 0), ('B', 1), ('C', 2), (None, 3), (None, 4)]

In [54]:
list(itertools.zip_longest('ABC', range(5), fillvalue='?'))

[('A', 0), ('B', 1), ('C', 2), ('?', 3), ('?', 4)]

In [55]:
list(itertools.product('ABC', range(2)))

[('A', 0), ('A', 1), ('B', 0), ('B', 1), ('C', 0), ('C', 1)]

In [56]:
suits = 'spades hearts diamonds clubs'.split()
list(itertools.product('AK', suits)) 

[('A', 'spades'),
 ('A', 'hearts'),
 ('A', 'diamonds'),
 ('A', 'clubs'),
 ('K', 'spades'),
 ('K', 'hearts'),
 ('K', 'diamonds'),
 ('K', 'clubs')]

In [57]:
list(itertools.product('ABC'))

[('A',), ('B',), ('C',)]

In [58]:
list(itertools.product('ABC', repeat=2))

[('A', 'A'),
 ('A', 'B'),
 ('A', 'C'),
 ('B', 'A'),
 ('B', 'B'),
 ('B', 'C'),
 ('C', 'A'),
 ('C', 'B'),
 ('C', 'C')]

In [59]:
list(itertools.product(range(2), repeat=3))

[(0, 0, 0),
 (0, 0, 1),
 (0, 1, 0),
 (0, 1, 1),
 (1, 0, 0),
 (1, 0, 1),
 (1, 1, 0),
 (1, 1, 1)]

In [60]:
rows = itertools.product('AB', range(2), repeat=2)
for row in rows: print(row)

('A', 0, 'A', 0)
('A', 0, 'A', 1)
('A', 0, 'B', 0)
('A', 0, 'B', 1)
('A', 1, 'A', 0)
('A', 1, 'A', 1)
('A', 1, 'B', 0)
('A', 1, 'B', 1)
('B', 0, 'A', 0)
('B', 0, 'A', 1)
('B', 0, 'B', 0)
('B', 0, 'B', 1)
('B', 1, 'A', 0)
('B', 1, 'A', 1)
('B', 1, 'B', 0)
('B', 1, 'B', 1)


|Module| Function| Description|
|-|-|-|
|itertools|combinations(it,out_len)|Yield combinations of out_len items from the items yielded by it|
|itertools|combinations_with_replacement(it, out_len)|Yield combinations of out_len items from the items yielded by it, including combinations with repeated items|
|itertools|count(start=0, step=1)|Yields numbers starting at start, incremented by step, indefinitely|
|itertools|cycle(it)|Yields items from it storing a copy of each, then yields the entire sequence repeatedly, indefinitely|
|itertools|permutations(it,out_len=None)|Yield permutations of out_len items from the items yielded by it; by default, out_len is len(list(it))|
|itertools|repeat(item, [times])|Yield the given item repeatedly, indefinitely unless a number of times is given|

In [63]:
ct = itertools.count()
next(ct), next(ct), next(ct)

(0, 1, 2)

In [64]:
list(itertools.islice(itertools.count(1, .3), 3))

[1, 1.3, 1.6]

In [65]:
cy = itertools.cycle('ABC')
next(cy),  next(cy),  next(cy),  next(cy)

('A', 'B', 'C', 'A')

In [66]:
list(itertools.islice(cy, 7)) 

['B', 'C', 'A', 'B', 'C', 'A', 'B']

In [67]:
rp = itertools.repeat(7)
next(rp), next(rp)

(7, 7)

In [68]:
list(map(operator.mul, range(11), itertools.repeat(5)))

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]

In [69]:
list(itertools.combinations('ABC', 2))

[('A', 'B'), ('A', 'C'), ('B', 'C')]

In [70]:
list(itertools.combinations_with_replacement('ABC', 2))

[('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'B'), ('B', 'C'), ('C', 'C')]

In [71]:
list(itertools.permutations('ABC', 2))

[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

In [72]:
list(itertools.product('ABC', repeat=2)) 

[('A', 'A'),
 ('A', 'B'),
 ('A', 'C'),
 ('B', 'A'),
 ('B', 'B'),
 ('B', 'C'),
 ('C', 'A'),
 ('C', 'B'),
 ('C', 'C')]

|Module| Function| Description|
|-|-|-|
|itertools|groupby(it, key=None)|Yields 2-tuples of the form (key, group), where key is the grouping criterion and group is a generator yielding the items in the group|
|(built-in) |reversed(seq) |Yields items from seq in reverse order, from last to first; seq must be a sequence or implement the `__reversed__` special method|
|itertools|tee(it, n=2)|Yields a tuple of n generators, each yielding the items of the input iterable independently|


In [73]:
list(itertools.groupby('LLLLAAGGG'))

[('L', <itertools._grouper at 0x2a47b7ec070>),
 ('A', <itertools._grouper at 0x2a47b7ec970>),
 ('G', <itertools._grouper at 0x2a47b7ec8e0>)]

In [74]:
for char, group in itertools.groupby('LLLLAAAGG'):
    print(char, '->', list(group))

L -> ['L', 'L', 'L', 'L']
A -> ['A', 'A', 'A']
G -> ['G', 'G']


In [75]:
animals = ['duck', 'eagle', 'rat', 'giraffe', 'bear',
           'bat', 'dolphin', 'shark', 'lion']
animals.sort(key=len)
animals

['rat', 'bat', 'duck', 'bear', 'lion', 'eagle', 'shark', 'giraffe', 'dolphin']

In [76]:
for length, group in itertools.groupby(animals, len):
    print(length, '->', list(group))

3 -> ['rat', 'bat']
4 -> ['duck', 'bear', 'lion']
5 -> ['eagle', 'shark']
7 -> ['giraffe', 'dolphin']


In [77]:
for length, group in itertools.groupby(reversed(animals), len):
    print(length, '->', list(group))

7 -> ['dolphin', 'giraffe']
5 -> ['shark', 'eagle']
4 -> ['lion', 'bear', 'duck']
3 -> ['bat', 'rat']


In [78]:
list(itertools.tee('ABC'))

[<itertools._tee at 0x2a47b8d1a00>, <itertools._tee at 0x2a47b8d2580>]

In [80]:
g1, g2 = itertools.tee('ABC')

In [81]:
list(zip(*itertools.tee('ABC')))

[('A', 'A'), ('B', 'B'), ('C', 'C')]

## New Syntax in Python 3.3: yield from

Nested for loops are the traditional solution when a generator function needs to yield values produced from another generator.

```py
def chain(*iterables):
    for it in iterables:
        for i in it
            yield i
```

In [82]:
def chain(*iterables):
     for i in iterables:
         yield from i

In [83]:
s = 'ABC'
t = tuple(range(3))
list(chain(s, t))

['A', 'B', 'C', 0, 1, 2]

## Iterable Reducing Functions


|Module|Function|Description|
|-|-|-|
|(built-in)|all(it)|Returns True if all items in it are truthy, otherwise False; all([]) returns True|
|(built-in)|any(it)|Returns True if any item in it is truthy, otherwise False; any([]) returns False|
|(built-in)|max(it, [key=,] [default=])|Returns the maximum value of the items in it; a key is an ordering function, as in sorted; default is returned if the iterable is empty|
|(built-in)|min(it, [key=,] [default=])|Returns the minimum value of the items in it. key is an ordering function, as in sorted; default is returned if the iterable is empty|
|functools|reduce(func, it, [initial])|Returns the result of applying func to the first pair of items, then to that result and the third item and so on; if given, initial forms the initial pair with the first item|
|(built-in)|sum(it, start=0)|The sum of all items in it, with the optional start value added (use math.fsum for better precision when adding floats)|

In [86]:
all([1, 2, 3]), all([1, 0, 3]),  all([])

(True, False, True)

In [88]:
any([1, 2, 3]), any([1, 0, 3]), any([0, 0.0]), any([])

(True, True, False, False)

## A Closer Look at the iter Function

Iter can be called with two arguments to create an iterator from a regular function or any callable object
- first argument must be a callable to be invoked repeatedly
- second argument is a sentinel: a marker value which, when returned by  the callable, causes the iterator to raise StopIteration instead of yielding the sentinel.

In [89]:
from random import randint

def d6():
    return randint(1, 6)


d6_iter = iter(d6, 1)
d6_iter

<callable_iterator at 0x2a47b7ec850>

In [90]:
for roll in d6_iter:
    print(roll)

4
6
5
4


## Generators as Coroutines

- PEP 342 — Coroutines via Enhanced Generators was implemented in Python 2.5. This proposal added extra methods and functionality to generator objects, most notably the `.send()` method
- Like `.__next__()`, `.send()` causes the generator to advance to the next yield
- also allows the client using the generator to send data into it
- whatever argument is passed to `.send()` becomes the value of the corresponding yield expression inside the generator function body
- `.send() allows two-way data exchange between the client code and the generator`

# Chapter Summary

-  An overview of 24 general-purpose generator functions in the standard library
-  iter built-in function: first, to see how it returns an iterator when called as iter(o), and then to study how it builds an iterator from any function when called as iter(func, sentinel)