![Py4Eng](img/logo.png)

# Iteration
## Yoav Ram

# Comprehensions

Comprehensions are a compact way to process all or part of the elements in a sequence and return a new container.

## List comprehensions

Say we have a bunch of measurements in a list called `data` and we want to calculate the mean and the standard deviation.

The usual way to do this is with `for` loops:

In [3]:
data = [1, 7, 3, 6, 9, 2, 7, 8]
mean = sum(data) / len(data)
print('Mean:', mean)

deviations = []
for x in data:
    deviations.append((x - mean)**2)
    
var = sum(deviations) / len(deviations)
stdev = var**0.5

print("Standard deviation:", stdev)

Mean: 5.375
Standard deviation: 2.7810744326608736


We can replace the `for` loop with a list comprehension:

In [17]:
deviations = [(x - mean)**2 for x in data]
var = sum(deviations) / len(deviations)
stdev = var**0.5
print("Standard deviation:", stdev)

Standard deviation: 2.7810744326608736


We can add a condition, too. Say you want to calculate the deviations of above-average data:

In [18]:
pos_devs = [(x - mean)**2 for x in data if x > mean]
pos_devs

[2.640625, 0.390625, 13.140625, 2.640625, 6.890625]

## Exercise

Remember that a leap is is a year that is divisible by 400 or divisible by 4 but not by 100.

Write a listcomp to build a list of all leap years until today. Print the length of the list.

### Cartesian product

We can insert several `for` statements in a single listcomp, producing all elements of a cartesian product.

For example, consider how we construct a full pack of cards:

In [8]:
suits = 'heart spade club diamond'
numbers = range(2,11)
royals = 'J Q K A'
cards = [(s, n) for s in suits.split() for n in list(numbers) + royals.split()]
print(len(cards))

52


## Dictionary comprehensions

We can also use comprehensions to create dictionaries:

In [19]:
data_devs = {x: (x - mean)**2 for x in data}
data_devs

{1: 19.140625,
 2: 11.390625,
 3: 5.640625,
 6: 0.390625,
 7: 2.640625,
 8: 6.890625,
 9: 13.140625}

## Set comprehensions

In [24]:
with open('../data/gulliver.txt') as f:
    txt = f.read().lower()
unique_words = {w for w in txt.split()}

print(len(unique_words))
print('love' in unique_words)
print('war' in unique_words)

9371
True
True


If the file is very big we could do it line by line and update the set (it's mutable):

In [1]:
unique_words = set()
with open('../data/gulliver.txt') as f:    
    for line in f:
        line_unique_words = {w for w in line.lower().split()}
        unique_words.update(line_unique_words)

print(len(unique_words))
print('love' in unique_words)
print('war' in unique_words)

9371
True
True


# Iterators

From [Python docs](https://docs.python.org/3.5/library/stdtypes.html#iterator-types):

> Python supports a concept of iteration over containers. This is implemented using two distinct methods; these are used to allow user-defined classes to support iteration. Sequences always support the iteration methods.


- `__iter__`: If implemented in a container, the method should return an iterator object; this makes the container and *iterable*. If implemented in an iterator, it should return the iterator itself. This is the method used by `for ... in ...`.
- `__next__`: Returns the next item from the iteration (i.e. on the container) when the iterator is called with `next(...)`. If there are not more items this method should raise `StopIteration` exception.

Note that the iterable container itself cannot be it's own iterator, because then two concurrent iteration will not be independent.

The easiest way to implement iterators is using generators rather than defining new objects, so we'll start doing that.

# Generator expressions

Generator expressions look similar to list comprehensions but really they return an iterator rather than a new collection.

For example, if we are not interested in the deviations but only in their sum, we don't have to build the actual `list`, like the comprehension does, but rather we can use a generator, which produces each value as we need it, using **lazy evaluation**:

In [4]:
var = sum((x - mean)**2 for x in data) / len(data)
print("Standard deviation:", var**0.5)

Standard deviation: 2.7810744326608736


When going over many elements, sometimes we don't need the list in the memory, just one number at a time. This is where generators shine. For example, the `range` function is actually a generator, which can be converted into a list:

In [21]:
range(10)

range(0, 10)

In [22]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

If we were to go over all the numbers from 0 to $10^8$, creating that list before the iteration is costly in space. A generator doesn't create the entire list in memory, but rather lazily creates each elements as it is needed.

We load [ipython_memory_usage](https://github.com/ianozsvald/ipython_memory_usage), a package that monitors memory usage inside the notebook.

Install from with `pip install git+https://github.com/ianozsvald/ipython_memory_usage.git`, and on Windows you will also n
`conda install psutil`.

In [26]:
import ipython_memory_usage.ipython_memory_usage as imu
imu.start_watching_memory()

In [26] used 0.1016 MiB RAM in 332.29s, peaked 3.72 MiB above current, total RAM usage 43.80 MiB


In [27]:
lst = list(range(10**8))

In [27] used 3870.5156 MiB RAM in 3.37s, peaked 0.00 MiB above current, total RAM usage 3914.32 MiB


In [28]:
rng = range(10**8)

In [28] used 0.0117 MiB RAM in 0.11s, peaked 0.02 MiB above current, total RAM usage 3914.33 MiB


In [29]:
del lst
imu.stop_watching_memory()

![Task manager](https://raw.github.com/yoavram/CS1001.py/master/list_vs_generator.png)

Another example:

In [38]:
%timeit -n 3 [x for x in range(1, 10**6) if x % 2 == 0]

3 loops, best of 3: 183 ms per loop


In [39]:
%timeit -n 3 (x for x in range(1, 10**6) if x % 2 == 0)

3 loops, best of 3: 1.21 µs per loop


## Exercise


Given in the code below is a dictionary (named `code`) where the keys represent encrypted characters and the values are the corresponding decrypted characters. Use the dictionary to decrypt an ecnrypted message (named `secret`) and print out the resulting cleartext message.

* Use a generator expression to decrypt the secret message using the code dictionary. 
* Make sure you aren't creating any lists in intermediate steps. 
* Experiment with newlines inside `( )` to improve readability.

In [1]:
secret = """Wq osakk le eh ue usq qhp, mq osakk xzlsu zh Fcahgq,
mq osakk xzlsu eh usq oqao ahp egqaho,
mq osakk xzlsu mzus lcemzhl gehxzpqhgq ahp lcemzhl oucqhlus zh usq azc, mq osakk pqxqhp ebc Iokahp, msauqjqc usq geou dat rq,
mq osakk xzlsu eh usq rqagsqo,
mq osakk xzlsu eh usq kahpzhl lcebhpo,
mq osakk xzlsu zh usq xzqkpo ahp zh usq oucqquo,
mq osakk xzlsu zh usq szkko;
mq osakk hqjqc obccqhpqc, ahp qjqh zx, mszgs I pe heu xec a dedqhu rqkzqjq, uszo Iokahp ec a kaclq iacu ex zu mqcq obrfblauqp ahp ouacjzhl, usqh ebc Edizcq rqtehp usq oqao, acdqp ahp lbacpqp rt usq Bczuzos Fkqqu, mebkp gacct eh usq oucbllkq, bhuzk, zh Gep’o leep uzdq, usq Nqm Weckp, mzus akk zuo iemqc ahp dzlsu, ouqio xecus ue usq cqogbq ahp usq kzrqcauzeh ex usq ekp."""

code = {'u': 't', 'y': 'q', 's': 'h', 'f': 'j', 'n': 'k', 'p': 'd', 'x': 'f', 't': 'y', 'i': 'p', 'a': 'a', 'e': 'o', 'l': 'g', 'o': 's', 'q': 'e', 'j': 'v', 'k': 'l', 'g': 'c', 'r': 'b', 'h': 'n', 'b': 'u', 'c': 'r', 'w': 'x', 'z': 'i', 'm': 'w', 'v': 'z', 'd': 'm'}
decoded = (code.get(c, c)  for c in secret)
joined = str.join('', decoded) 
print(joined)

We shall go on to the end, we shall fight in France,
we shall fight on the seas and oceans,
we shall fight with growing confidence and growing strength in the air, we shall defend our Island, whatever the cost may be,
we shall fight on the beaches,
we shall fight on the landing grounds,
we shall fight in the fields and in the streets,
we shall fight in the hills;
we shall never surrender, and even if, which I do not for a moment believe, this Island or a large part of it were subjugated and starving, then our Empire beyond the seas, armed and guarded by the British Fleet, would carry on the struggle, until, in God’s good time, the New World, with all its power and might, steps forth to the rescue and the liberation of the old.


Note that `code.get` is only called when `join` starts pulling elements from `decoded`.

Here, the generator expression creates a new generator object which implements the iterator protocol, and therefor is an instance of `collections.Iterator`.

However, the best way to check if something is iterable is to try to iterate over it and catch the `TypeError` if it can't be iterated:

In [21]:
for x in range(10):
    pass

for x in (x**2 for x in range(10)):
    pass

for x in 5:
    pass

TypeError: 'int' object is not iterable

# Generator functions

We can write more complex generators using functions in which the `yield` statement replaces the `return` statement. 
After calling these functions we have a generator object which implements implements the iterator protocol, and not neccessarily on a specific existing container (similar to `range`). This allows to create very flexible iterations, including potentially infinite ones.

Say we want to go over all the natural numbers to find a number that matches some condition.
We write a generator for the natural numbers, which is kind of a non-limit `range` (note that this can be done using `itertools.count` from the standard library):

In [20]:
def natural_numbers():
    n = 0
    while True:
        n += 1
        yield n
gen = natural_numbers()
print(gen)
hasattr(gen, '__iter__'), hasattr(gen, '__next__'), isinstance(natural_numbers(), Iterator)

<generator object natural_numbers at 0x00000000046D6A40>


(True, True, True)

We can consume values from the generator one by one using the `next` function:

In [113]:
print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))

1
2
3
4


 or by giving it to a `for` loop (note that the above can be done using `itertools.takewhile`)

In [44]:
for n in natural_numbers():
    if n % 667 == 0 and n**2 % 766 == 0:
        break
print(n)

510922


Of course, this can only be done using a generator - we cannot create a list of all natural numbers... 

Note that our `natural_numbers` generator could have been created using the `itertools.count` function that is sort of a more flexible version of `range`.

Let's do another small example to understand how a generator works:

In [25]:
def example_generator():
    print("Start.")
    for x in range(3):
        print("Next.")
        yield x
    print("Done.")
    
for x in example_generator():
    print(x)

Start.
Next.
0
Next.
1
Next.
2
Done.


## Exercise

Write a generator that, given a file handle, iterates over the **non-empty** lines in the file, removing the newline from the end of the lines.

Run it on the file `data\winter wind.txt` (download from [GitHub](https://raw.githubusercontent.com/yoavram/Py4Eng/master/data/winter_wind.txt)).

## Example

Let's use generators to solve Project Euler's [Problem 10](https://projecteuler.net/problem=10):

> The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.

> Find the sum of all the primes below two million.

Of course, it's not reasonable to make a list of all primes bellow two million, as it will require too much space.
So we will use generators.

First, we need a primality test function. We won't go into the syntax here.

In [81]:
def is_prime(a):
    return not (a < 2 or any(a % x == 0 for x in range(2, int(a ** 0.5) + 1)))

We now define a generator of numbers below a certain number (this can be done by combining `range` and `filter`):

In [82]:
def primes(top=10):
    n = 2
    while n < top:
        if is_prime(n):
            yield n
        n += 1
list(primes(30))

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

and we sum all primes bellow 2000000 (this will take a couple of minutes):

In [84]:
sum(primes(2000000))

142913828922

## Exercise

Create a generator that returns numbers from the Fibonacci series, defined by:

$$
a_0 = 0 \\
a_1 = 1 \\
a_n = a_{n-2} + a_{n-1}
$$

## Pass values into generators

`next` allows us to use generators outside of `for` loops. Calling `next` on a generator object will give the next result of the generator (run generator to next `yield` statement):

In [85]:
double_digit_primes = primes(100)
next(double_digit_primes), next(double_digit_primes), next(double_digit_primes), next(double_digit_primes)

(2, 3, 5, 7)

`send` allows us to **pass values into generators**.

In [5]:
def jumping_natural_numbers():
    n = 0
    while True:
        jump = yield n 
        if jump is None:
            jump = 1
        n += jump

In [6]:
gen = jumping_natural_numbers()
gen.send(None), gen.send(3), gen.send(1)

(0, 3, 4)

## Example

A more whole example is given by the following generator that calculates a running average of a stream of numbers fed to it. It does so by keeping just three numbers - the sum, the count (number of numbers), and the average.

Each time a new number is sent, it yields the updated average and then waits for another number to be sent.

This is much more effcient it terms of space then actually keeping all the numbers; also, it's great for cases in which we actually have a stream of numbers.

In [8]:
def running_average():
    summ = 0
    count = 0
    avg = 0
    
    while True:
        n = yield avg
        count += 1
        summ += n
        avg = summ / count

In [14]:
import time
import random

avg = running_average()
avg.send(None) # initialize the generator by sending None or by next(avg)

# mimick a stream of random integers between 0 and 9, inclusive
random_stream = (random.randint(0, 9) for _ in range(10))
for n in random_stream:
    current_avg = avg.send(n)
    print('{:d}: {:.4f}'.format(n, current_avg))

9: 9.0000
6: 7.5000
6: 7.0000
2: 5.7500
0: 4.6000
6: 4.8333
1: 4.2857
6: 4.5000
0: 4.0000
9: 4.5000


See more at Jeff Knupp's [blog](http://www.jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/).

## Exercise

Write a generator `have_seen` that accepts a string via `send` and yields a boolean in response.

The function will yield `True` if the input string was already seen by the function, and `False` otherwise. The function should be case insensitive.

Use it to find the first word that repeats itself in the following text (the first paragraph from [Gulliver's Travels](https://ia801404.us.archive.org/2/items/gulliverstravels17157gut/17157.txt)).

This exercise follows [Readable Python coroutines](http://takluyver.github.io/posts/readable-python-coroutines.html) by Thomas Kluyver.

In [169]:
def have_seen():
    pass

In [170]:
text = """My father had a small estate in Nottinghamshire; I was the third of five
sons. He sent me to Emmanuel College in Cambridge at fourteen years old,
where I resided three years, and applied myself close to my studies;
but the charge of maintaining me, although I had a very scanty
allowance, being too great for a narrow fortune, I was bound apprentice
to Mr. James Bates, an eminent surgeon in London, with whom I continued
four years; and my father now and then sending me small sums of money, I
laid them out in learning navigation, and other parts of the mathematics
useful to those who intend to travel, as I always believed it would be,
some time or other, my fortune to do. When I left Mr. Bates, I went down
to my father, where, by the assistance of him, and my uncle John and
some other relations, I got forty pounds, and a promise of thirty
pounds a year, to maintain me at Leyden. There I studied physic two
years and seven months, knowing it would be useful in long voyages."""
text = text.lower().split()



## map

`map` creates an iterator that applies a function on all elements of an iterable. Like generators, `map` also does lazy evaluation.

In [4]:
poets = [
    'Shel Silverstein', 
    'Pablo Neruda', 
    'Maya Angelou',
    'Edgar Allan Poe',
    'Robert Frost',
    'Emily Dickinson',
    'Walt Whitman'
]

def firstname(name):
    return name.partition(' ')[0]
first_names = map(firstname , poets)
first_names

<map at 0x48a9320>

In [5]:
for n in first_names:
    print(n)

Shel
Pablo
Maya
Edgar
Robert
Emily
Walt


Note that in general `partition` is better for this use than `split` because it only looks for one `sep` and returns 3 items, whereas `split` will look for all seperators, so for a name like *Jeff van Gundy* it will take slightly more time ot run.

In [135]:
name = 'Jeff van Gundy'
%timeit name.split()[0]
%timeit name.partition(' ')[0]

The slowest run took 8.06 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 599 ns per loop
The slowest run took 5.87 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 411 ns per loop


You could say that a generator expression does more or else the same thing as a `map`. And that's true.  However, sometimes a `map` is more suitable in terms of readability. Also, another advantage of a `map` is that it operates a function that has it's own scope, whereas a generator expression lives in the parent scope.

## `filter`

`filter` creates an iterator that produces all the elements in an iterable that pass a certain test. The test is given using a boolean function.

In [6]:
def is_even(n):
    return n % 2 == 0

evens = filter(is_even, natural_numbers())
for n in evens:
    print(n, end=", ")
    if n >= 20: 
        break

2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 

## `reduce`

`reduce` applies a function of two arguments ($f:R^2 \to R$) cumulatively to the items of a sequence,
from left to right, so as to reduce the sequence to a single value.

`reduce` is part of the `functools` module

In [15]:
from functools import reduce
import operator

The easiest example is a replacement for `sum`:

In [31]:
reduce(operator.add, range(10))

45

What about coding a product equivalent of `sum`?

In [12]:
reduce(operator.mul, range(1, 10))

362880

Calculating the Fibonacci series up to the n-th element can also be done with `reduce`. Here we can set the initial seed to be different from the first element of the sequence, and we ignore the values from the input sequence as we always assign the "right" value to `_`:

In [13]:
def fib_reducer(x, _):
    return x + [x[-1] + x[-2]]

def fib(n):
    return reduce(fib_reducer, range(n - 2), [0, 1])
fib(10)

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

## Exercise

Use `reduce` and something from the `operator` module to write the function `my_all(iterable)` that return `True` if all elements of `iterable` are `True` and `False` otherwise.

**Note** Python already has an `all` function, so you can compare your results to it.

In [16]:
seq1 = [True,  True, True]
seq2 = [True, False, True]

def myall(iterable):
    # Your code here
    pass

assert myall(seq1) == all(seq1)
assert myall(seq2) == all(seq2)

Now use a combination of `map` and `reduce` to write a function that checks if all the numbers in an iterable are larger than `n`:

In [32]:
def check_larger(iterable, n):
    # Your code here
    pass

In [25]:
assert check_larger([1, 3, 5, 7], 5) == False
assert check_larger([1, 3, 5, 7], 0) == True

**Note** that `all` (or `any`) is implemented with a short-circuit so that once it sees a `False` (or `True`) it stops the iteration.

This can be much faster if a `False` appears early on in the sequence:

In [20]:
data = [False] + [True] * 100000

%timeit -n 10 myall(data)
%timeit -n 10 all(data)

10 loops, best of 3: 6.21 ms per loop
10 loops, best of 3: 121 ns per loop


# References
- Some code and ideas were taken from [CS1001.py](https://github.com/yoavram/CS1001.py)
- Some code and ideas were taken from [Code like a Pythonista](http://python.net/~goodger/projects/pycon/2007/idiomatic/presentation.html)
- The [itertools](https://docs.python.org/3.5/library/itertools.html) module has some functions and tools for creating iterators.
- The [operator](https://docs.python.org/3.5/library/operator.html) module has more functions that emulate operators to used in map, reduce, etc.
- [Functional programming HOWTO](https://docs.python.org/3.5/howto/functional.html) has some more information, as the comprehensions, iterators, map-reduce and filter are the building blocks of functional programming.
- The [toolz](http://toolz.readthedocs.org/) package has many more tools to use in a functional programming style.
- Read Chapter 14 in the [Fluent Python](http://shop.oreilly.com/product/0636920032519.do) book by Luciano Ramalho for a deep dive into Iterators and everything related.

## Colophon
This notebook was written by [Yoav Ram](http://www.yoavram.com) and is part of the _Python for Engineers_ course.

The notebook was written using [Python](http://pytho.org/) 3.4.4, [IPython](http://ipython.org/) 4.0.3 and [Jupyter](http://jupyter.org) 4.0.6.

This work is licensed under a CC BY-NC-SA 4.0 International License.

![Python logo](https://www.python.org/static/community_logos/python-logo.png)