# 14. Iterables, iterators and generators

Iteration Pattern : when scanning datasets that don't fit in memory, we need a way to fetch the items lazily, that is, one at a time and on demand.

The `yield` keyword allows the construction of generators, which work as iterators.

모든 generator는 iterator의 한 종류이다. generator는 iterator interface를 완전히 구현한다.

파이썬3에선 range() built-in 조차 완전한 리스트를 내놓는게 아니라 generator-like object를 내놓는다. 만d약 `range`로 부터 `list`를 만들려면 명시적으로 `list(range(100))`라고 해야 한다.

Every collection in Python is `iterable` and iterators are used intenally to support :
    - for loops:
    - collection types construction and extension;
    - looping over text files line by line;
    - list, dict and set comprehensions;
    - tuple unpacking;
    - unpacking actual parameters with * in function calls.
    
This chapter covers:
    - How the `iter(..)` built-in function is used internally to handle iterable objects.
    - How to implement the classic Iterator pattern in Python.
    - How a generator function works in detail, with line by line descriptions
    - How the classic Iteartor can be replaced by a generator function or generator expression
    - Leveraging the general purpose generator functions in the standard library.
    - Using the new `yield` from statement to combine generators
    - A case study : using generator functions in a database conversion utility designed to work
                     with large data sets.
    - Why generators and coroutines look alike but are actually very different and should not be 
        mixed.
                         

## 1. Sentencs take #1: a sequence of words

Hot the `iter(..) functuon` makes sequences iterable

The first version will implement the sequence protocol, it's iterable because all sequences are iterable.

In [55]:
import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:
    
    def __init__(self, text):
        self.text = text
        # re.findall 은 regular expression과 오버래핑 매칭되지 않는 모든 리스트를 string으로 리턴
        self.words = RE_WORD.findall(text)
        
    def __getitem__(self, index):
        # self.words는 .findall의 결과를 가지고 있고 index가 주어지면 간단히 해당 단어를 리턴한다
        return self.words[index]
    
    def __len__(self):
        # sequence protocol을 마무리 지으려면, __len__을 구현한다. 
        # iterable object를 만들기 위해서 필요한것은 아니다
        return len(self.words)
    
    def __repr__(self):
        # reprlib.repr는 우틸리티 함수로 아주 길어질 수 있는 데이터 구조의 축약된
        # string representation 을 만든다.
        return 'Sentence(%s)' % reprlib.repr(self.text)

In [56]:
RE_WORD.findall('"The time has come," the Walrus said,')

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

In [2]:
# A sentence is created from a string.
s = Sentence('"The time has come," the Walrus said,')
# __repr__의 결과가 reprelib.repr에 의해 ...을 사용해서 만들어진다.
s

Sentence('"The time ha... Walrus said,')

In [3]:
type(s)

__main__.Sentence

In [7]:
# Sentence instances are iterable
for word in s:
    print(word)

The
time
has
come
the
Walrus
said


In [6]:
# Being iterable, Sentence objects can be used as input to build lists and other iterable types.
list(s)

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

In [8]:
s[0]

'The'

In [9]:
s[5]

'Walrus'

In [10]:
s[-1]

'said'

### Why sequences are iterable : the iter function

Whenever the interpreter needs to iterate over an object x, it automatically calls `iter(x)`.

The `iter` built-in function : 
1. Checks whether the object implements, `__iter__`, and claas that to obtain an iterator;
2. If `__iter__` is not implemented, but `__gettime__` is implemented, Python creates an iterator that attempts to fetech items in order, starting from index 0(zero).
3. If that fails, Python raises `TypeError` and usually saying, `"'C' object is not iterable'`, where C ins the class of the target object.


In fact, the standard sequences also implement `__iter__`, and yours should too, because handling of `__getitem__` exists for backward compatibility reasons and may be gone in the future.

This is an extreme form of duck typing : an object is considered iterable not only when it implements the special method `__iter__`, but also when it implements `__getitem__`, as long as `__getitem__` accpets `int` keys starting from 0.


In [11]:
class Foo:
    def __iter__(self):
        pass

In [12]:
from collections import abc
issubclass(Foo, abc.Iterable)

True

In [13]:
f = Foo()
issubclass(f, abc.Iterable)

AttributeError: 'Foo' object has no attribute '__mro__'

In [15]:
issubclass(Sentence, abc.Iterable)

False

However, note that out initial `Sentence` class does not pass the `issubclass(Sentence, abc.Iterable)` test, even though it is iterable in practice.

In [57]:
class Struggle:
    def __len__(self): return 23
    

In [58]:
from collections import abc
isinstance(Struggle(), abc.Sized)

True

In [59]:
issubclass(Struggle, abc.Sized)

True

## 2. Iterables versus iterators

Iterable
- Any object from which the `iter` built-in function can obtain an iterator. **Objects implementing an `__iter__` method returning an `iterator` are iterable.** Sequences are always iterable; so as are objects implementing a `__getitem__` method which takes 0-based indexes.

Python obtains iterators from iterables.

The str `ABC` is the iterable here. 

In [16]:
s = 'ABC'
for char in s:
    print(char)

A
B
C


In [17]:
s = 'ABC'
# Build an iterator it from the iterable.
it = iter(s)
while True:
    try:
        # Repeatedly call next on the iterator to obtain the next item.
        print(next(it))
    # The iterator raises StopIteration when there are no futher items
    except StopIteration:
        # Release reference to it - the iterator object is discarded.
        del it
        break

A
B
C


The standard interface for an iterator has two methods:
`__next__` : Returns the next available item, raising StopIteration when there are no more items.
`__iter__` : Returns self; this allows iterators to be used where an iterable is expected, for example, in a `for` loop.

The `Iterator ABC` implements `__iter__` by doing return self. 

This allows an iterator to be used wherever an iterable is required.

Thnaks to `Iterator.__subclasshook__`, this test works even if the class of x is not a real or virtual subclass of `Iterator`.

You can clearly see how the iterator is built by `iter(...)` and consumed by `next(...)` using the Python console

In [18]:
s3 = Sentence('Pig and Pepper')
# Obtain an iterator from s3.
it = iter(s3)
it

<iterator at 0x1d022835668>

In [19]:
next(it)

'Pig'

In [20]:
next(it)

'and'

In [21]:
next(it)

'Pepper'

In [22]:
next(it)

StopIteration: 

In [23]:
# Once exhausted, an iterator becomes useless.
list(it)

[]

In [24]:
# To go over the sentence again, a new iterator must be built.
list(iter(s3))

['Pig', 'and', 'Pepper']

Calling `iter(...)` on the iterator itself won't help, because - as mentioned - `Iterator.__iter__` is implemented by returning `self`, so this wil not reset a depleted iterator.

Here is a definition for `iterator`:

Any object that implements the `__next__` no-argument method which returns the next item in a series or raises `StopIteration` when there are no more items. Python iterator also implements the `__iter__` method so they are `iterable` as well.

## 3. Sentence take #2 : a classic iterator

Now we'll impelment the standard iterlable protocol.

`Sentence` that is iterable because it impelments the `__iter__` special method which builds and returns a `SentenceIterator`.



In [None]:
import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:
    
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
        
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    
    # __iter__ method is the only addition to the previous Sentence implementaiton.
    # This version has no __getitem__, to make it clear that the class
    # is iterable because it implements __iter__.
    def __iter__(self):
        # __iter__ fulfils the iterable protocol by instantiating and returning an iterator.
        return SentenceIterator(self.words)
    
class SentenceIterator:
    
    def __init__(self, words):
        # holds a reference to the list of the words
        self.words = words
        # self.index is used to determine the next word to fetch.
        self.index = 0
        
    def __next__(self):
        try:
            # Get the word at self.index
            word = self.words[self.index]
        except IndexError:
            # if there is no word at self.index, raise StopIteration.
            raise StopIteration()
        self.index += 1
        return word
    # implement self.__iter__
    def __iter__(self):
        return self

Note that implmenting `__iter__` in `SentenceIterator` is not actually needed for this example to work, but it's right thing to do : iterators are supposed to implement both `__next__` and `__iter__`. and doing so makes our iterator pass the `issubcalss(SentenceIterator, abc.Iterator)` test.

### Making Sentence an iterator : bad idea

- Iterables have a `__iter__` method that instantiates a new iterator every time.

- Iterators implement a `__next__` method that returns individual items, and a `__iter__` method that returns `self`.

Therefore, iterators are also iterable, but iterables are not iterators.

To support multiple traversals, it must be possible to obtain multiple independent interator from the same iterable instance,

A proper implementation of the pattern requires each call to `iter(my_iterable)` to create a new, independent, iterator.

## 4. Sentence take #3 : a generator function

`__iter__` is generator function which, when called, builds a generator object which implements the iterator interface, so the `SentecneIterator` class is no longer needed.

In [1]:
import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
        
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    
    def __iter__(self):
        for word in self.words:
            yield word
        return

In [6]:
Sentence('abc').__iter__

<bound method Sentence.__iter__ of Sentence('abc')>

Now the iterator is in fact a generator object, built automatically when the `__iter__` method is called, because `__iter__` here is a generator function.

### How a generator function works

Any Python function that the `yield` keyword in its body is a generator function :  a function which, when called, returns a generator object.

In other words, a generator function is a generator factory.

In [3]:
# any function that contains the yield keyword is a generator function.
def gen_123():
    yield 1
    yield 2
    yield 3

In [4]:
# gen_123 is function object
gen_123

<function __main__.gen_123()>

In [5]:
# generator object
gen_123()

<generator object gen_123 at 0x000001BAE7E98CA8>

Generators are iterators that produce the values of the expressions passed to yield.

In [6]:
# Generators are iterators that produce the values of the expressions passed to yield.
for i in gen_123():
    print(i)

1
2
3


In [7]:
# For closer inspection, we assign the generator object to g.
g = gen_123()

In [8]:
# g is an iterator, calling next(g) fetches the next item produced by yield.
next(g)

1

In [9]:
next(g)

2

In [10]:
next(g)

3

In [11]:
# When the body of the function completes, the generator object raises a StopIteration.
next(g)

StopIteration: 

A generator function builds a generator object which wraps the body of the function.



In [2]:
def gen_AB():
    print('start')
    # The first implicit call to next() in the for loop at 5 will print 'start' and stop
    # at the first yield, producing the value 'A'.
    yield 'A'
    print('continue')
    yield 'B'
    print('end.')
    

In [3]:
# To iterate, the for machinery does the equivalent of g = iter(gen_A()) to get a generator object,
# and then next(g) at each iteration.
for c in gen_AB():
    # The loop block prints --> and the value returned by next(g). But this output will be seen
    # only after the output of the print calles inside the generator function.
    print('-->', c)

start
--> A
continue
--> B
end.


## 5. Sentence take #4 : a lazy implementation

A lazy implementation postpones producing values to the last possible moment. This saves memory and may avoid useless processing as well.


The `Iterator` interface is designed to be lazy: `next(my_iterator)` produces one item at a tiem.

`re.finditer` function is a lazy version of `re.findall` which, instead of a list, returns a generator producing `re.MatchObject` instances on demand.

In [4]:
import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:
    
    def __init__(self, text):
        self.text = text
        
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    
    def __iter__(self):
        # Return an iterator over all non-overlapping matches for the RE pattern in string
        for match in RE_WORD.finditer(self.text):
            # mactch.group() extracts the actual matched text from the MatchObject instance.
            yield match.group()

## 6. Sentence take #5 : a generator expression

Generator functions are an awesome shortcut, but the code can be made even shorter with a generator expression.

A generator expression can be understood as a lazy version of a list comprehension : it does not eagerly build a list, but returns a generator that will lazily produce the items on demand.

In [5]:
def gen_AB():
    print('start')
    yield 'A'
    print('continue')
    yield 'B'
    print('end.')
    

The list comprehension eagerly iterates over the items yielded by the generator object produced by calling `gen_AB()` : `A` and `B`. 

In [6]:
res1 = [x*3 for x in gen_AB()]

start
continue
end.


In [8]:
res1

['AAA', 'BBB']

This `for` loop is iterating over the `res1` list produced by the list comprehension.

In [9]:
for i in res1:
    print('-->', i)

--> AAA
--> BBB


The generator expression returns `res2`
The call to `gen_AB()` is made, but that call returns a generator which is not consumed here.

In [10]:
res2 = (x*3 for x in gen_AB())

`res2` is a generator object.

In [12]:
res2

<generator object <genexpr> at 0x000001F272BA8FC0>

Only when the for loop iterates over `res2`, the body of `gen_AB` actually executes.

Each iteration of the for loop implicitly calls `next(res2)`, advancing `gen_AB` to the next `yiled`. Note the output of `gen_AB` with the output of the `print` in the for loop.

In [13]:
for i in res2:
    print('-->', i)

start
--> AAA
continue
--> BBB
end.


The caller of `__iter__` gets a generator object.

In [None]:
import re
import reprlib

RE_WORD = re.complie('\w+')

class Sentence:
    
    def __init__(self, text):
        self.text = text
        
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    # __iter__ method which here is not a generator function (it has no yield)
    # but uses a generator expression to build a generator and then returns it.
    def __iter__(self):
        return (match.gropu() for match in RE_WORD.finditer(self.text))

## 7. Generator expression : when to use them

Generator functions are much more flexible: you can code complex logic with multiple statements, and can even use them as `coroutines`

If the generator expression spans more than a couple of lines, I prefer to code a generator function for the sake of readibility.

## 8. Another example : arithmetic progression generator

genearators can also be used to produce values independent of a data source.

But a standard interface based on a method to fetch the next item in a series is also useful when the items are produced on the fly, instead of retrieved from a collection.

In [14]:
class ArithmeticProgression:
    
    # __init__ requires two arguments: begin and step.end is optional, it's None
    # the series will be unbounded.
    def __init__(self, begin, step, end=None):
        self.begin = begin
        self.step = step
        self.end = end
        
    def __iter__(self):
        # This line produces a result value equal to self.begin, but coerced
        # to the type of the subsequent additions.
        result = type(self.begin + self.step)(self.begin)
        # For readibility, the forever flag will be True if the self.end attribute is None,
        # resulting in an unbounded series.
        forever = self.end is None
        index = 0
        # This loop runs forever or untill the result matches or exceeds self.end.
        # When this loop exists, so does the function.
        while forever or result < self.end:
            yield result
            index += 1
            # The next potential result is calculated. It may never be yielded,
            # because the while loop may terminate.
            result = self.begin + self.step * index
            

In [15]:
ap = ArithmeticProgression(0, 1, 3)

In [16]:
list(ap)

[0, 1, 2]

In [17]:
ap = ArithmeticProgression(1, .5, 3)

In [18]:
list(ap)

[1.0, 1.5, 2.0, 2.5]

In [19]:
ap = ArithmeticProgression(0, 1/3, 1)

In [20]:
list(ap)

[0.0, 0.3333333333333333, 0.6666666666666666]

In [21]:
from fractions import Fraction
ap = ArithmeticProgression(0, Fraction(1, 3), 1)

In [22]:
list(ap)

[Fraction(0, 1), Fraction(1, 3), Fraction(2, 3)]

In [23]:
# This class implements rational numbers.
Fraction(1,3)

Fraction(1, 3)

In [24]:
from decimal import Decimal
ap = ArithmeticProgression(0, Decimal('.1'), .3)

In [25]:
list(ap)

[Decimal('0'), Decimal('0.1'), Decimal('0.2')]

A generator function that does the same job as `ArithmeitcProgression` with less code.

In [26]:
def artprog_gen(begin, step, end=None):
    result = type(begin+step)(begin)
    forever = end is None
    index = 0
    while forever or result < end:
        yield result
        index += 1
        result =  begin + step *index

There are plenty of ready-to-use generators in the standard library.

### Arithmetic progression with itertools

the `itertools.count` function returns a generator that produces numbers.

It produces a series of integers starting with 0.=

In [28]:
import itertools

gen = itertools.count(1, .5)
next(gen)

1

In [29]:
next(gen)

1.5

In [30]:
next(gen)

2.0

However, `itertools.count` never stops, so if you call `list(count())`, Python will try to build a `list` larger than available memory.

`itertools.takewhile` produces a generator which consumes antoher generator and stops when a given predicate evaluate to `False`. 

In [31]:
# takewhile(predicate, iterable) --> takewhile object
gen = itertools.takewhile(lambda n : n<3, itertools.count(1, .5))
list(gen)

[1, 1.5, 2.0, 2.5]

`aritprog_gen` is not a generator function : it has no `yield` in its body.

But it returns a generator, so it operates as a generator factory, just as generator function does.

In [32]:
import itertools

def aritprog_gen(begin, step, end=None):
    first = type(begin + step)(begin)
    
    ap_gen = itertools.count(first, step)
    if end is not None:
        ap_gen = itertools.takewhile(lambda n: n<end, ap_gen)
        
    return ap_gen

## 9. Generator functions in the standard library

**Filtering generator function**: they yield a subset of items produced by the input iterable, without changing the items themselves. 

In [33]:
def vowel(c):
    return c.lower() in 'aeiou'

In [34]:
list(filter(vowel, 'Aardvark'))

['A', 'a', 'a']

In [36]:
'Aardvark'.lower() in 'aeiou'

False

In [37]:
import itertools
list(itertools.filterfalse(vowel, 'Aardvark'))

['r', 'd', 'v', 'r', 'k']

consumes it skipping items while `predicate` computes truthy, then yields every remaining item(no further checks are made)

In [38]:
list(itertools.dropwhile(vowel, 'Aardvark'))

['r', 'd', 'v', 'a', 'r', 'k']

In [39]:
# yieleds items while predicate computes truthy, then stops and no further checks are made 
list(itertools.takewhile(vowel, 'Aardvark'))

['A', 'a']

In [40]:
# consumes two iterables in parallel; yields items from it whenever the corresponding
# item in selector_it is truthy.
list(itertools.compress('Aardvark', (1,0,1,1,0,1)))

['A', 'r', 'd', 'a']

`islice(iterable, stop)` --> islice object

`islice(iterable, start, stop[, step])` --> islice object

In [41]:
list(itertools.islice('Aardvark', 4))

['A', 'a', 'r', 'd']

In [42]:
list(itertools.islice('Aardvark', 4, 7))

['v', 'a', 'r']

In [43]:
list(itertools.islice('Aardvark', 1, 7,2))

['a', 'd', 'a']

**Mapping generators** : they yields items computed from each individual item in the input iterable - or iterables, in the case of `map` and `starmap`.

The generator yield one result per item in the input iterables.

If the input comes from more than one iterable, the output stops as soon asa the first input iterable is exhausted.

accumulate : yields accumulated sums ; if `func` is provided, yields  the result of applying it the first pair of items, then to the first result and next item etc.

In [44]:
sample = [5,4,2,8,7,6,3,0,9,1]
import itertools
# list(iterable) -> new list initialized from iterable's items
list(itertools.accumulate(sample))

[5, 9, 11, 19, 26, 32, 35, 35, 44, 45]

In [45]:
itertools.accumulate(sample)

<itertools.accumulate at 0x1f273c78808>

In [46]:
list(itertools.accumulate(sample, min))

[5, 4, 2, 2, 2, 2, 2, 0, 0, 0]

In [47]:
list(itertools.accumulate(sample, max))

[5, 5, 5, 8, 8, 8, 8, 8, 9, 9]

In [48]:
import operator
list(itertools.accumulate(sample, operator.mul))

[5, 20, 40, 320, 2240, 13440, 40320, 0, 0, 0]

In [49]:
list(itertools.accumulate(range(1,11), operator.mul))

[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

**enumerate(iterable, start=0)** : yields 2-tuples of the form (index, item), where index is counted from start, and `item` is take from the iterable

In [1]:
list(enumerate('albatroz', 1))

[(1, 'a'),
 (2, 'l'),
 (3, 'b'),
 (4, 'a'),
 (5, 't'),
 (6, 'r'),
 (7, 'o'),
 (8, 'z')]

**map(func, it1, [it2, .. itN])** : applies func to each item of it, yielding the result; if N iterables are given, func must take N arguments and the iterables will be consumed in parallel.

In [2]:
import operator
list(map(operator.mul, range(11), range(11)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [3]:
list(map(operator.mul, range(11), [2,4,8]))

[0, 4, 16]

In [4]:
# This is what the zip built-in function does.
list(map(lambda a,b : (a,b), range(11), [2,4,8]))

[(0, 2), (1, 4), (2, 8)]

**itertools : startamp(func, it)** : applies func to each item of it, yielding the result; the input iterable should yiled iterable item `iit`, and  `func` is applied as `func(*iit)`

In [5]:
import itertools
list(itertools.starmap(operator.mul, enumerate('albatroz', 1)))

['a', 'll', 'bbb', 'aaaa', 'ttttt', 'rrrrrr', 'ooooooo', 'zzzzzzzz']

In [7]:
# Returning average
sample = [5,4,2,8,7,6,3,0,9,1]
list(itertools.starmap(lambda a,b : b/a, enumerate(itertools.accumulate(sample), 1)))

[5.0,
 4.5,
 3.6666666666666665,
 4.75,
 5.2,
 5.333333333333333,
 5.0,
 4.375,
 4.888888888888889,
 4.5]

In [8]:
list(enumerate(itertools.accumulate(sample), 1))

[(1, 5),
 (2, 9),
 (3, 11),
 (4, 19),
 (5, 26),
 (6, 32),
 (7, 35),
 (8, 35),
 (9, 44),
 (10, 45)]

**merging generators** : yields items from multiple input iterables.

`chain` and `chain.from_iterable` consume the input iterables sequentially (one after the other), while `product`, `zip` and `zip_longest` consume the input iterable in parallel.

**itertools : chain(it1, ... , itN)** : yield all items `it1`, then from `it2` ect. seamlessly.

In [9]:
list(itertools.chain('ABC', range(2)))

['A', 'B', 'C', 0, 1]

In [10]:
list(itertools.chain(enumerate('ABC')))

[(0, 'A'), (1, 'B'), (2, 'C')]

**itertools : chain.from_iterable(it) ** : yield all items from each iterable produced by `it`, one after the other, for example, a list of iterables

In [11]:
list(itertools.chain.from_iterable(enumerate('ABC')))

[0, 'A', 1, 'B', 2, 'C']

**zip(it1, ..., itN)** : yields N-tuples built from items taken from the iterables in parallel, silently stopping when the first iterable is exhausted.

In [12]:
list(zip('ABC', range(5)))

[('A', 0), ('B', 1), ('C', 2)]

In [13]:
list(zip('ABC', range(5), [10,20,30,40]))

[('A', 0, 10), ('B', 1, 20), ('C', 2, 30)]

**itertools : zip_longest(it1, .., itN, fillvalue=None)** : yields N-tuples built from items taken from the iterables in parallel, stopping only when the last iterable is exhausted, filling the blanks with the `fillvalue`

In [14]:
list(itertools.zip_longest('ABC', range(5)))

[('A', 0), ('B', 1), ('C', 2), (None, 3), (None, 4)]

In [15]:
list(itertools.zip_longest('ABC', range(5), fillvalue='?'))

[('A', 0), ('B', 1), ('C', 2), ('?', 3), ('?', 4)]

**itertools.product(it1, ..., itN, repeat=1)** generator is a lazy way of computing cartesian products, which we built using list comprehensions with more than one `for` clause

Generator expression with multiple `for` clauses can also be used to produce cartesian products lazily.

In [17]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts = [(color, size) for color in colors for size in sizes]
tshirts 

[('black', 'S'),
 ('black', 'M'),
 ('black', 'L'),
 ('white', 'S'),
 ('white', 'M'),
 ('white', 'L')]

In [18]:
list(itertools.product('ABC', range(2)))

[('A', 0), ('A', 1), ('B', 0), ('B', 1), ('C', 0), ('C', 1)]

In [19]:
suits = 'spades hearts diamonds clubs'.split()

In [20]:
list(itertools.product('AK', suits))

[('A', 'spades'),
 ('A', 'hearts'),
 ('A', 'diamonds'),
 ('A', 'clubs'),
 ('K', 'spades'),
 ('K', 'hearts'),
 ('K', 'diamonds'),
 ('K', 'clubs')]

In [21]:
list(itertools.product('ABC'))

[('A',), ('B',), ('C',)]

The `repeat=N` keyword argument tells product to consume each input iterable `N` times

In [22]:
list(itertools.product('ABC', repeat=2))

[('A', 'A'),
 ('A', 'B'),
 ('A', 'C'),
 ('B', 'A'),
 ('B', 'B'),
 ('B', 'C'),
 ('C', 'A'),
 ('C', 'B'),
 ('C', 'C')]

In [23]:
list(itertools.product(range(2), repeat=3))

[(0, 0, 0),
 (0, 0, 1),
 (0, 1, 0),
 (0, 1, 1),
 (1, 0, 0),
 (1, 0, 1),
 (1, 1, 0),
 (1, 1, 1)]

In [24]:
rows = itertools.product('AB', range(2), repeat=2)
for row in rows : print(row)

('A', 0, 'A', 0)
('A', 0, 'A', 1)
('A', 0, 'B', 0)
('A', 0, 'B', 1)
('A', 1, 'A', 0)
('A', 1, 'A', 1)
('A', 1, 'B', 0)
('A', 1, 'B', 1)
('B', 0, 'A', 0)
('B', 0, 'A', 1)
('B', 0, 'B', 0)
('B', 0, 'B', 1)
('B', 1, 'A', 0)
('B', 1, 'A', 1)
('B', 1, 'B', 0)
('B', 1, 'B', 1)


Some generator functions expand the input by yielding more than one value per input item.

`count` and `repeat` functions from `itertools` return generator that conjure items out of nothing : neither of them takes an iterable as input.

`cycle` generator makes a backup of the input iterable and yields its items repeatedly.

In [25]:
# count(start=0, step=1) --> count object
# build a count generator ct.
ct = itertools.count()

In [26]:
type(ct)

itertools.count

In [27]:
# Retrieve the first item from ct.
next(ct)

0

In [28]:
# can't build a list from ct, because ct never stops, so I fetch the next three items.
next(ct), next(ct), next(ct)

(1, 2, 3)

In [29]:
# islice(iterable, stop) --> islice object
list(itertools.islice(itertools.count(1, .3), 3))

[1, 1.3, 1.6]

In [30]:
cy = itertools.cycle('ABC')

In [31]:
next(cy)

'A'

In [32]:
list(itertools.islice(cy, 7))

['B', 'C', 'A', 'B', 'C', 'A', 'B']

In [33]:
rp = itertools.repeat(7)
next(rp), next(rp)

(7, 7)

In [34]:
# repeat(object [,times]) -> create an iterator which returns the object
# for the specified number of times.  
list(itertools.repeat(8,4))

[8, 8, 8, 8]

In [35]:
list(map(operator.mul, range(11), itertools.repeat(5)))

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]

The `combinations, combinations_with_replacement` and `permutations` generator functions - together with `product` - are caleed the `combinatoric generator`


**itertools.combination(it, out_len)** : yield combinations of `out_len` item from the itmes yields by `it`

In [40]:
# All combinations of len()==2 from the item in 'ABC'; item ordering in  the generated tuples 
# is irrelevant(they could be stes)
list(itertools.combinations('ABC', 2))

[('A', 'B'), ('A', 'C'), ('B', 'C')]

In [37]:
list(itertools.combinations_with_replacement('ABC', 2))

[('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'B'), ('B', 'C'), ('C', 'C')]

In [38]:
list(itertools.permutations('ABC', 2))

[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

In [39]:
list(itertools.product('ABC', repeat=2))

[('A', 'A'),
 ('A', 'B'),
 ('A', 'C'),
 ('B', 'A'),
 ('B', 'B'),
 ('B', 'C'),
 ('C', 'A'),
 ('C', 'B'),
 ('C', 'C')]

`itertools.groupby` and `itertools.tee` : yield all items in the input iterables, but rearanged in some way.

Note that `itertools.groupby` assumes that the input iterable is sorted by the grouping criterion, or at least that the items are clustered by that criterion - even if not sorted.

`reversed` built-in is the only one covered in this section that does not accept any iterable as input, but only sequences. But it avoids the cost of making a reversed copy of the sequence by yielding each item as needed.

In [41]:
# groupby yields tuples of (key, group_generator)
list(itertools.groupby('LLLLAAGGG'))

[('L', <itertools._grouper at 0x20bc649dcf8>),
 ('A', <itertools._grouper at 0x20bc649dfd0>),
 ('G', <itertools._grouper at 0x20bc649def0>)]

In [43]:
# Handling groupby generators involves nested iteration : in this case the outer for loop and
# the inner list constructer.
for char, group in itertools.groupby('LLLLAAAGG'):
    print(char, '->', list(group))

L -> ['L', 'L', 'L', 'L']
A -> ['A', 'A', 'A']
G -> ['G', 'G']


In [44]:
animals = ['duck', 'eagle', 'rat', 'giraffe', 'bear', 'bat', 'dolphin', 'shark', 'lion']
animals.sort(key=len)

In [45]:
animals

['rat', 'bat', 'duck', 'bear', 'lion', 'eagle', 'shark', 'giraffe', 'dolphin']

In [46]:
# groupby(iterable[, keyfunc]) -> create an iterator which returns
# (key, sub-iterator) grouped by each value of key(value)
for length, group in itertools.groupby(reversed(animals), len):
    print(length, '->', list(group))

7 -> ['dolphin', 'giraffe']
5 -> ['shark', 'eagle']
4 -> ['lion', 'bear', 'duck']
3 -> ['bat', 'rat']


`iterator.tee` yields multiple generators from a single input iterable, each yielding every item from input. Those generators can be consumed independently

In [47]:
# tee(iterable, n=2) --> tuple of n independent iterators.
list(itertools.tee('ABC'))

[<itertools._tee at 0x20bc638fd48>, <itertools._tee at 0x20bc638f108>]

In [48]:
g1, g2 = itertools.tee('ABC')

In [49]:
next(g1)

'A'

In [50]:
next(g2)

'A'

In [51]:
next(g2)

'B'

In [52]:
list(g1)

['B', 'C']

In [53]:
list(g2)

['C']

In [54]:
list(zip(*itertools.tee('ABC')))

[('A', 'A'), ('B', 'B'), ('C', 'C')]

## 10. New syntax in Python 3.3 : yield from iterable reducing functions

Nested for loops are the traditional solution when a generator function needs to yield values produced from another generator

In [50]:
def chain(*iterables):
    for it in iterables:
        for i in it:
            yield i
            

In [51]:
s = 'ABC'
t = tuple(range(3))
list(chain(s, t))

['A', 'B', 'C', 0, 1, 2]

In [52]:
def chain(*iterables):
    for i in iterables:
        yield from i
        

In [53]:
list(chain(s,t))

['A', 'B', 'C', 0, 1, 2]

`yield from i` replaces the inner `for` loop completely.

`yield from` creates a channel connecting the inner generator directly to the client of the outer generator.

## 11. Iterable reducing functions

"reducing" , "folding" or "accumulating" functions take an iterable and return a single result.

`all` and `any` functions short-circuit, that is, they stop consumming the iterator as soon as the reulst is determined.

`all(it)` : returns `True` if all items in `it` are truthy, otherwise `False`; `all([])` returns `True`



In [54]:
all([1,2,3])

True

In [55]:
all([1,0,3])

False

In [56]:
all([])

True

`any(it)` : returns `True` if any item in `it` is truthy, otherwise `False` ; `any([])` returns `False`

In [57]:
any([1,2,3])

True

In [58]:
any([1,0,3])

True

In [59]:
any([0, 0.0])

False

In [60]:
any([])

False

In [61]:
g = (n for n in [0, 0.0, 7, 8])

In [64]:
type(g)

generator

In [62]:
any(g)

True

In [63]:
next(g)

8

Another built-in that takes an iterable and returns something else is `sorted`. 

Unlike `reversed`, which is a generator function, `sorted` builds and returns an actual list.

## 12. A closser look at the iter function

`iter() built-in` has a little known feature

`iter` has another trick : it can be called with two arguments to create an iterator from a regular function or any callable object.

the first argument must be a callable to be invoked repeatedly (with no arguments) to yield values, and the arguement is a sentinel : a marker value which, when returned by the callable, causes the iterator to raise `StopIteration` instead of yielding the sentinel.

In [1]:
import random

def d6():
    return random.randint(1,6)

In [2]:
d6_iter = iter(d6, 1)

In [3]:
d6_iter

<callable_iterator at 0x27d19406898>

In [4]:
for roll in d6_iter:
    print(roll)

4
6
2
6
6
5
5
2
5


useful example : This snippet reads lines from a file until a blank line is found or the end of file is reached

```python
with open('mydata.txt') as fp:
    for line in iter(fp.readline, ''):
        process_line(line)
```

## 13. Case study : generators in database converion utility

## 14. Generators as coroutines

Like `.__next__()`, `.send()` causes the generator to advance to the next `yield`, but it also allows the client using the generator to send data into it.

`.send()` allows two-way data exchange between the client code and the generator - in contrast with `.__next()__` which only lets the client receive data from the generator.

- Generators produce data for iteration
- Coroutines are consumers of data
- To keep your brain from exploding, you don't mix the two concepts together
- Coroutines are not related to iteration
- Note : There is a use of having yield produce a value in a coroutine, but it's not tied to iteration.

## 15. Chapter summary

We then coded a generator of arithmetic progressions and showed how to leverage the `itertools` module to make it simpler. 

we looked at the `iter` built-in function : first to see how it returns an iterator when called as `iter(o)`, then to study how it builds an iterator from any function when called as `iter(func, sentinel)`.


decouple the reading to the writing logic, enabling efficient handling of large data sets and making it easy to support more than one data input format.