### List Comprehensions

We've used list comprehensions throughout this course quite a bit, so the concept should not be new, but let's recap quickly what we have seen so far with list comprehensions.

A list comprehension is language construct that allows to easily build a list by transforming, and optionally, filtering, another iterable.

For example, using a more traditional Java style approach we might create a list of squares of the first 100 positive integers in this way:

In [None]:
squares = []  # create an empty list
for i in range(1, 101):
    squares.append(i**2)

We now have a list containing the desired numbers:

In [None]:
squares[0:10]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Using a list comprehension we can achieve the same results in a far more expressive way:

In [None]:
squares = [i**2 for i in range(1, 101)]

In [None]:
squares[0:10]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

When building a list from another iterable we may sometimes want to skip certain values.

For example, we may want to build a list of squares for even positive integers only, up to 100.

The more traditional way would go like this:

In [None]:
squares = []
for i in range(1, 101):
    if i % 2 == 0:
        squares.append(i**2)

In [None]:
squares[0:10]

[4, 16, 36, 64, 100, 144, 196, 256, 324, 400]

We can also use a list comprehension to achieve the same thing:

In [None]:
squares = [i**2 for i in range(1, 101) if i % 2 == 0]

In [None]:
squares[0:10]

[4, 16, 36, 64, 100, 144, 196, 256, 324, 400]

Although I have been writing the list comprehension on a single line, we can write them over multiple lines if we prefer:

In [None]:
squares = [i**2
          for i in range(1, 101)
          if i % 2 == 0]

In [None]:
squares[0:10]

[4, 16, 36, 64, 100, 144, 196, 256, 324, 400]

Internal Mechanics of List Comprehensions

We need to recognize that list comprehensions are essentially temporary functions that Python creates, executes and returns the resulting list from it.

We can see this by compiling a comprehension, and then disassembling the compiled code to see what happened:

In [None]:
import dis

In [None]:
compiled_code = compile('[i**2 for i in (1, 2, 3)]', 
                        filename='', mode='eval')

In [None]:
dis.dis(compiled_code)

  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x000001F77210ED20, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_CONST               5 ((1, 2, 3))
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE


As you can see, in step 4, Python created a function (`MAKE_FUNCTION`), called it (`CALL_FUNCTION`), and then returned the result (`RETURN_VALUE`) in the last step.

So, comprehensions will behave like functions in terms of **scope**. They have local scope, and can access global and nonlocal scopes too. And nested comprehensions will also behave like nested functions and closures.

#### Nested Comprehensions

Let's look at a simple example that uses nested comprehensions.

For example, suppose we want to generate a multiplication table:

The traditional way first:

In [None]:
table = []
for i in range(1, 11):
    row = []
    for j in range(1, 11):
        row.append(i*j)
    table.append(row)

In [None]:
table

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
 [3, 6, 9, 12, 15, 18, 21, 24, 27, 30],
 [4, 8, 12, 16, 20, 24, 28, 32, 36, 40],
 [5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
 [6, 12, 18, 24, 30, 36, 42, 48, 54, 60],
 [7, 14, 21, 28, 35, 42, 49, 56, 63, 70],
 [8, 16, 24, 32, 40, 48, 56, 64, 72, 80],
 [9, 18, 27, 36, 45, 54, 63, 72, 81, 90],
 [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]]

We can easily do the same thing using a list comprehension:

In [None]:
table2 = [ [i * j for j in range(1, 11)] 
          for i in range(1, 11)]

In [None]:
table2

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
 [3, 6, 9, 12, 15, 18, 21, 24, 27, 30],
 [4, 8, 12, 16, 20, 24, 28, 32, 36, 40],
 [5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
 [6, 12, 18, 24, 30, 36, 42, 48, 54, 60],
 [7, 14, 21, 28, 35, 42, 49, 56, 63, 70],
 [8, 16, 24, 32, 40, 48, 56, 64, 72, 80],
 [9, 18, 27, 36, 45, 54, 63, 72, 81, 90],
 [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]]

You'll notice here that we nested one list comprehension inside another.

You should also notice that the inner comprehension (the one that has `i*j`) is accessing a local variable `i`, as well as a variable from the enclosing comprehension - the `j` variable. Just like a closure! And in fact, it is exactly that. We'll come back to that in a bit.

Let's do another example - we'll construct Pascal's triangle - which is basically just a triangle of binomial coefficients:

```
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
```

we just need to know how to calculate combinations:
```
C(n, k) = n! / (k! (n-k)!)
```

* row 0, column 0: n=0, k=0: c(0, 0) = 0! / 0! 0! = 1/1 = 1
* row 4, column 2: n=4, k=2: c(4, 2) = 4! / 2! 2! = 4x3x2 / 2x2 = 6

In other words, we need to calculate the following list of lists:
```
c(0,0)
c(1,0) c(1,1)
c(2,0) c(2,1) c(2,3)
c(3,0) c(3,1) c(3,2) c(3,3)
...
```

We can use a nested comprehension for that!

In [None]:
from math import factorial

def combo(n, k):
    return factorial(n) // (factorial(k) * factorial(n-k))

size = 10  # global variable
pascal = [ [combo(n, k) for k in range(n+1)] for n in range(size+1) ]

In [None]:
pascal

[[1],
 [1, 1],
 [1, 2, 1],
 [1, 3, 3, 1],
 [1, 4, 6, 4, 1],
 [1, 5, 10, 10, 5, 1],
 [1, 6, 15, 20, 15, 6, 1],
 [1, 7, 21, 35, 35, 21, 7, 1],
 [1, 8, 28, 56, 70, 56, 28, 8, 1],
 [1, 9, 36, 84, 126, 126, 84, 36, 9, 1],
 [1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1]]

Again note how the outer comprehension accessed a global variable (`size`), created a local variable (`n`), and the inner comprehension created its own local variable (`k`) and also accessed the nonlocal variable `n`.

#### Nested Loops

We can also created comprehensions that use nested loops (not nested comprehensions, just nested loops).

Let's start with a simple example.

Suppose we have two lists of characters, and we want to produce a new list consisting of the pairwise concatenated characters.

e.g. 
`l1 = ['a', 'b', 'c']`

`l2 = ['x', 'y', 'z']`

and we want to produce the result:

`['ax', 'ay', 'az', 'bx', 'by', 'bz', 'cx', 'cy', 'cz']`


The traditional way first:

In [None]:
l1 = ['a', 'b', 'c']
l2 = ['x', 'y', 'z']
result = []
for s1 in l1:
    for s2 in l2:
        result.append(s1+s2)


In [None]:
result

['ax', 'ay', 'az', 'bx', 'by', 'bz', 'cx', 'cy', 'cz']

We can do the same nested loop using a comprehension instead:

In [None]:
result = [s1 + s2 for s1 in l1 for s2 in l2]

In [None]:
result

['ax', 'ay', 'az', 'bx', 'by', 'bz', 'cx', 'cy', 'cz']

We could expand this slightly by specifying that pairs resulting in the same letter twice should be ommitted:

In [None]:
l1 = ['a', 'b', 'c']
l2 = ['b', 'c', 'd']

In [None]:
result = []
for s1 in l1:
    for s2 in l2:
        if s1 != s2:
            result.append(s1 + s2)

In [None]:
result

['ab', 'ac', 'ad', 'bc', 'bd', 'cb', 'cd']

And the comprehension equivalent:

In [None]:
result = [s1 + s2 for s1 in l1 for s2 in l2 if s1 != s2]

In [None]:
result

['ab', 'ac', 'ad', 'bc', 'bd', 'cb', 'cd']

Building up the complexity, let's see how we might reproduce the `zip` function.

Remember what the `zip` function does:

In [None]:
l1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
l2 = ['a', 'b', 'c', 'd']
list(zip(l1, l2))

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

We can do the same thing using a traditional nested loop:

In [None]:
result = []
for index_1, item_1 in enumerate(l1):
    for index_2, item_2 in enumerate(l2):
        if index_1 == index_2:
            result.append((item_1, item_2))

In [None]:
result

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

But we can do this using a list comprehension as well:

In [None]:
result = [ (item_1, item_2)
         for index_1, item_1 in enumerate(l1)
         for index_2, item_2 in enumerate(l2)
         if index_1 == index_2]

In [None]:
result

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

Of course, using `zip` is way simpler!

List comprehensions can also be quite handy when used in conjunction with functions such as `sum` for example.

Suppose we have two n-dimensional vectors, represented as tuple of numbers, and we want to find the dot product of the two vectors:

`
v1 = (c1, c2, c3, ..., cn)
v2 = (d1, d2, d3, ..., dn)
`

Then, the dot product is:

`
c1 * d1 + c2 * d2 + ... + cn * dn
`

The trick here is that we want to step through each vectors at the same time (a simple nested loop would not work), so a Java-like approach might be:

In [None]:
v1 = (1, 2, 3, 4, 5, 6)
v2 = (10, 20, 30, 40, 50, 60)

In [None]:
dot = 0
for i in range(len(v1)):
    dot += (v1[i] * v2[i])
print(dot)

910


But using zip and a list comprehension we can do it this way:

In [None]:
dot = sum([i * j for i, j in zip(v1, v2)])
print(dot)

910


In fact, and we'll cover this later in generator expressions, we don't even need the `[]`:

In [None]:
dot = sum(i * j for i, j in zip(v1, v2))
print(dot)

910


#### Things to watch out for

There are a few things we have to be careful with, and that relates to the scope of variables used inside a comprehension.

Let's first make sure we don't have the `number` symbol in our global scope:

In [None]:
if 'number' in globals():
    del number

In [None]:
l = [number**2 for number in range(5)]
print(l)

[0, 1, 4, 9, 16]


What was the scope of `number`?

In [None]:
'number' in globals()

False

As you can see, `number` was local to the comprehension, not the enclosing (global in this case) scope.

But what if `number` was in our global scope:

In [None]:
number = 100

In [None]:
l = [number**2 for number in range(5)]

In [None]:
number

100

As you can see, `number` in the comprehension was still local to the comprehension, and our global `number` was not affected. 

This is similar to global and nonlocal variables in functions.

Because `number` is the loop item, it means that it gets *assigned* a value before being referenced, hence it is considered local - even if that symbol exists in a global or nonlocal scope.

On the other hand, consider this example:


In [None]:
number = 100
l = [number * i for i in range(5)]
print(l)

[0, 100, 200, 300, 400]


As you can see, the scope of the comprehension was able to reach out for `number` in the global scope. Same as functions.

Now let's look at an example we've seen before when we studied closures.

Suppose we want to generate a list of functions that will calculate powers of their argument, i.e. we want to define a bunch of functions

* `fn_1(arg) --> arg ** 1`
* `fn_2(arg) --> arg ** 2`
* `fn_3(arg) --> arg ** 3`
etc...

We could certainly define a bunch of functions one by one:

In [None]:
fn_0 = lambda x: x**0
fn_1 = lambda x: x**1
fn_2 = lambda x: x**2
fn_3 = lambda x: x**3
# etc

But this would be very tedious if we had to do it more than just a few times.

Instead, why don't we create those functions as lambdas and put them into a list where the index of the list will correspond to the power we are looking for.

Something like this if we were doing it manually:

In [None]:
funcs = [lambda x: x**0, lambda x: x**1, lambda x: x**2, lambda x: x**3]

Now we can call these functions this way:

In [None]:
print(funcs[0](10))
print(funcs[1](10))
print(funcs[2](10))
print(funcs[3](10))

1
10
100
1000


Now all we need to do is to create these functions using a loop - the traditional way first:

First let's make sure `i` is not in our global symbol table:

In [None]:
if 'i' in globals():
    del i

In [None]:
funcs = []
for i in range(6):
    funcs.append(lambda x: x**i)

And let's use them as before:

In [None]:
print(funcs[0](10))
print(funcs[1](10))
print(funcs[2](10))
print(funcs[3](10))

100000
100000
100000
100000


What happened?? It looks like every function is actually calculating `10**5`

Let's break down what happened in the loop, but without using a loop.

Firs notice that `i` is now in our global symbol table:

In [None]:
print(i)

5


You'll also note that it has a value of `5` (from the last iteration that ran).

Now let's walk through what happened manually:

In the first iteration, the symbol `i` was created, and assigned a value of `0`:

In [None]:
i = 0
def fn_0(x):
    return x ** i

The `i` in `fn_0` is actually the global variable `i`.

For the next 'iteration' we increment `i` by `1`:

In [None]:
i=1
def fn_1(x):
    return x ** i

The `i` in `fn_1` is still the global variable `i`.

Now let's set `i` to something else:

In [None]:
i = 5

In [None]:
fn_0(10)

100000

In [None]:
fn_1(10)

100000

and if we change `i` again:

In [None]:
i = 10

In [None]:
fn_0(10)

10000000000

And this is **exactly** what happened in our loop based approach:

In [None]:
funcs = []
for i in range(6):
    funcs.append(lambda x: x**i)

When the loop ran, `i` was created in our **global** scope.

By the time the loop finished running, `i` was 5

In [None]:
print(i)

5


So when we call the functions, they are referencing the global variable `i` which is now set to `5`.

And the same precise thing will happen if we use a comprehension to do the same thing:

Let's delete the global `i` symbol first:

In [None]:
del i

In [None]:
'i' in globals()

False

In [None]:
funcs = [lambda x: x**i for i in range(6)]

In [None]:
'i' in globals()

False

As we can see `i` is not in our globals, but `i` was a **local** variable in the list comprehension, and each function created in the comprehension is referencing the same `i` - it is local to the comprehension, and each lambda is therefore a closure with (the same) free variable `i`. And by the time the comprehension has finished running, `i` had a value of 5:

In [None]:
funcs[0](10), funcs[1](10)

(100000, 100000)

Can we somehow fix this problem?

Yes, and it relies on default values and when default values are calculated and stored with the function definition. Recall that default values are evaluated and stored with the function's definition **when the function is being created (i.e. compiled)**. Right now we are running into a problem because the free variable `i` is being evauated inside each function's body at **run time**.

So, we can fix this by making each current value of `i` a paramer default of each lambda - this will get evaluated at the functions creation time - i.e. at each loop iteration:

In [None]:
funcs = [lambda x, pow=i: x**pow for i in range(6)]

In [None]:
funcs[0](10), funcs[1](10), funcs[2](10)

(1, 10, 100)

As you can see that solved the problem. But this relies on some pretty detailed understanding of Python's behavior, and it is better not to use such techniques - other people reading your code will find it confusing and will make the code much harder to understand.

We will come back to this comprehension syntax. We used it so far to create lists, but the same syntax will be used to create sets, dictionaries, and generators.

### Iterating Collections

We saw how sequence types support iteration by being able to access elements by index. We could even write our custom sequence types by implementing the `__getitem__` method.

But there are some limitations:

* items must be numerically indexable, with indexing starting at `0`
* cannot be used with unordered collections, such as sets

If we think about iterating over a collection, what we really need is a way to request the **next** item in the collection.

If we can do that, our collection does not require being indexable, nor does it need to be ordered (i.e. we don't need the notion of relative positions of elements in the container).

This is exactly what iterables are in general - they provide a method that returns the "next" element in the collection. This approach works equally well with sequence type collections, as well as unordered collection types such as sets.

Of course, the order in which **next** returns items from an unordered colllection is not known in advance - and we see that when we iterate over a set for example:

In [None]:
s = {'x', 'y', 'b', 'c', 'a'}
for item in s:
    print(item)

y
a
c
b
x


As you can see the order in which the elements of the set was returned, did not match the order in which we added elements to the set.

Furthermore, we cannot use indexing to access elements in a set:

In [None]:
s[0]

TypeError: 'set' object does not support indexing

### Rolling our own Next method

Let's go ahead and define a kind of iterable ourselves. 

What we'll want to do is to have a container type of class that implements a `next` method, instead of that `__getitem__` method. 

Every time we call `next`, it should return the next element in the collection - so we'll have to keep track of where we are in the iteration somehow.

Since `next` is a built-in function, which we'll look at in a bit, we'll use `next_` instead.

In [None]:
class Squares:
    def __init__(self):
        self.i = 0
    
    def next_(self):
        result = self.i ** 2
        self.i += 1
        return result

In [None]:
sq = Squares()

In [None]:
sq.next_()

0

In [None]:
sq.next_()

1

In [None]:
sq.next_()

4

How do we re-start the iteration from the beginning?

We can't - we have to create a new instance of `Squares`:

In [None]:
sq = Squares()

In [None]:
for i in range(10):
    print(sq.next_())

0
1
4
9
16
25
36
49
64
81


We even are able to iterate over the squares.

But you'll notice that we essentially have an **infinite** number of items.

We can fix that easily enough - by specifying a length when we create the collection, and raise an exception if `next_()` goes beyond the number of elements in the collection - we'll raise a `StopIteration` exception -- that's a built-in exception Python provides us specifically for this kind of scenario!!

We'll even implement a `__len__` method to support the `len()` function:

In [None]:
class Squares:
    def __init__(self, length):
        self.length = length
        self.i = 0
    
    def next_(self):
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result           
        
    def __len__(self):
        return self.length

In [None]:
sq = Squares(3)

In [None]:
len(sq)

3

In [None]:
sq.next_()

0

In [None]:
sq.next_()

1

In [None]:
sq.next_()

4

In [None]:
sq.next_()

StopIteration: 

So now, we can essentially loop over the collection in a very similar way to how we did it with sequences and the `__getitem__` method:

In [None]:
sq = Squares(5)
while True:
    try:
        print(sq.next_())
    except StopIteration:
        # reached end of iteration
        # stop looping
        break       

0
1
4
9
16


There are two issues here.
The first is that the "iterable" `sq` has been exhausted - we can't just "re-start" the iteration:

In [None]:
sq.next_()

StopIteration: 

The second problem is that we can't use a `for` loop - Python does not know about our `next_()` method:

In [None]:
for i in Squares(10):
    print(i)

TypeError: 'Squares' object is not iterable

Of course if we had a `__getitem__` method, everything would work again - but remember that `__getitem__` means we have a sequence type. Although our Squares is actually a sequence, we want to look at a more general way of creating containers that are not necessarily sequences.

Much like Python's `len()` function and the `__len__()` method, Python has a built-in `next()` function - it calls the `__next__()` method in our class if there is one.

Let's see this:

In [None]:
class Squares:
    def __init__(self, length):
        self.length = length
        self.i = 0
    
    def __next__(self):
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result   
    
    def __len__(self):
        return self.length

In [None]:
sq = Squares(3)

In [None]:
next(sq)

0

In [None]:
next(sq)

1

In [None]:
next(sq)

4

In [None]:
next(sq)

StopIteration: 

So that's nice, makes typing a bit easier - our loop we wrote earlier would look something like this now:

In [None]:
sq = Squares(5)
while True:
    try:
        print(next(sq))
    except StopIteration:
        break  

0
1
4
9
16


Does this mean Python can now iterate over an instance of Squares?

In [None]:
for i in Squares(10):
    print(i)

TypeError: 'Squares' object is not iterable

Nope, Python still does not recognize our class as an iterable collection.

We need to do a little bit more work to get there.

We also are going to need to look at how to "reset" the iteration without having to create a whole new object.

You'll notice that technically our `Squares` class could be built as a sequence type - it was just a very simple example.

Instead, let's build another collection that is a container of random numbers, but in no particular order.

In [None]:
import random

In [None]:
class RandomNumbers:
    def __init__(self, length, *, range_min=0, range_max=10):
        self.length = length
        self.range_min = range_min
        self.range_max = range_max
        self.num_requested = 0
        
    def __len__(self):
        return self.length
    
    def __next__(self):
        if self.num_requested >= self.length:
            raise StopIteration
        else:
            self.num_requested += 1
            return random.randint(self.range_min, self.range_max)

We can now iterate over instances of this object:

In [None]:
numbers = RandomNumbers(10)

In [None]:
len(numbers)

10

In [None]:
while True:
    try:
        print(next(numbers))
    except StopIteration:
        break

8
9
3
10
10
9
0
10
10
1


We still cannot use a `for` loop, and if we want to 'restart' the iteration, we have to create a new object every time.

In [None]:
numbers = RandomNumbers(10)

In [None]:
for item in numbers:
    print(item)

TypeError: 'RandomNumbers' object is not iterable

### Iterators

In the last lecture we saw that we could approach iterating over a collection using this concept of `next`.

But there were some downsides that did not resolve (yet!):
* we cannot use a `for` loop
* once we exhaust the iteration (repeatedly calling next), we're essentially done with object. The only way to iterate through it again is to create a new instance of the object.

First we are going to look at making our `next` be usable in a for loop.

This idea of using `__next__` and the `StopIteration` exception is exactly what Python does.

So, somehow we need to tell Python that the object we are dealing with can be used with `next`.

To do so, we create an `iterator` type object.

Iterators are objects that implement:
* a `__next__` method
* an `__iter__` method that simply returns the object itself

That's it - that's all there is to an iterator - two methods, `__iter__` and `__next__`.

Let's go back to our `Squares` example:

In [None]:
class Squares:
    def __init__(self, length):
        self.length = length
        self.i = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result

Now we can still call `next`:

In [None]:
sq = Squares(5)

In [None]:
print(next(sq))
print(next(sq))
print(next(sq))

0
1
4


Of course, our iterator still suffers from not being able to "reset" it - we just have to create a new instance:

In [None]:
sq = Squares(5)

But now, we can also use a `for` loop:

In [None]:
for item in sq:
    print(item)

0
1
4
9
16


Now `sq` is **exhausted**, so if we try to loop through again:

In [None]:
for item in sq:
    print(item)

We get nothing...

All we need to do is create a new iterator:

In [None]:
sq = Squares(5)

In [None]:
for item in sq:
    print(item)

0
1
4
9
16


Just like Python's built-in `next` function calls our `__next__` method, Python has a built-in function `iter` which calls the `__iter__` method:

In [None]:
sq = Squares(5)

In [None]:
id(sq)

1965579635736

In [None]:
id(sq.__iter__())

1965579635736

In [None]:
id(iter(sq))

1965579635736

And of course we can also use a list comprehension on our iterator object:

In [None]:
sq = Squares(5)

In [None]:
[item for item in sq if item%2==0]

[0, 4, 16]

We can even use any function that requires an iterable as an argument (iterators are iterable):

In [None]:
sq = Squares(5)
list(enumerate(sq))

[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16)]

But of course we have to be careful, our iterator was exhausted, so if try that again:

In [None]:
list(enumerate(sq))

[]

we get an empty list - instead we have to create a new iterator first:

In [None]:
sq = Squares(5)
list(enumerate(sq))

[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16)]

We can even use the `sorted` method on it:

In [None]:
sq = Squares(5)
sorted(sq, reverse=True)

[16, 9, 4, 1, 0]

#### Python Iterators Summary

Iterators are objects that implement the `__iter__` and `__next__` methods.

The `__iter__` method of an iterator just returns itself.

Once we fully iterate over an iterator, the iterator is **exhausted** and we can no longer use it for iteration purposes.

The way Python applies a `for` loop to an iterator object is basically what we saw with the `while` loop and the `StopIteration` exception.

In [None]:
sq = Squares(5)
while True:
    try:
        print(next(sq))
    except StopIteration:
        break

0
1
4
9
16


In fact we can easily see this by tweaking our iterator a bit:

In [None]:
class Squares:
    def __init__(self, length):
        self.length = length
        self.i = 0
        
    def __iter__(self):
        print('calling __iter__')
        return self
    
    def __next__(self):
        print('calling __next__')
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result

In [None]:
sq = Squares(5)

In [None]:
for i in sq:
    print(i)

calling __iter__
calling __next__
0
calling __next__
1
calling __next__
4
calling __next__
9
calling __next__
16
calling __next__


As you can see Python calls `__next__` (and stops once a `StopIteration` exception is raised).

But you'll notice that it also called the `__iter__` method.

In fact we'll see this happening in other places too:

In [None]:
sq = Squares(5)
[item for item in sq if item%2==0]

calling __iter__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__


[0, 4, 16]

In [None]:
sq = Squares(5)
list(enumerate(sq))

calling __iter__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__


[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16)]

In [None]:
sq = Squares(5)
sorted(sq, reverse=True)

calling __iter__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__


[16, 9, 4, 1, 0]

Why is `__iter__` being called? After all, it just returns itself!

That's the topic of the next lecture!

But let's see how we can mimic what Python is doing:

In [None]:
sq = Squares(5)
sq_iterator = iter(sq)
print(id(sq), id(sq_iterator))
while True:
    try:
        item = next(sq_iterator)
        print(item)
    except StopIteration:
        break

calling __iter__
1965579704808 1965579704808
calling __next__
0
calling __next__
1
calling __next__
4
calling __next__
9
calling __next__
16
calling __next__


As you can see, we first request an iterator from `sq` using the `iter` function, and then we iterate using the returned iterator. In the case of an iterator, the `iter` function just gets the iterator itself back.

### Iterators and Iterables

Previously we saw that we could create **iterator** objects by simply implementing:

* a `__next__` method that returns the next element in the container
* an `__iter__` method that just returns the object itself (the iterator object)

Doing that we could use a `for` loop, list comprehensions, and in fact use that iterator object anywhere an iterable was expected (like `enumerate`, `sorted`, and so on).

However, we had two outstanding issues/questions:
* when we looped over the iterator using a `for` loop (or a comprehension, or other functions that do some form of iteration), we saw that the `__iter__` was always called first.
* the iterator gets exhausted after we have finished iterating it fully - which means we have to create a new iterator every time we want to use a new iteration over the collection - can we somehow avoid having to remember to do that every time?

The answer to both of these questions are related.

Let's start by looking at how we might avoid having to create a new instance of the collection every time we want to iterate over it.

After all, we don't need a new instance of the elements, just some kind of *resetting* of *current* item.

Let's start with a simple example that has those issues:

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['Paris', 'Berlin', 'Rome', 'Madrid', 'London']
        self._index = 0
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._cities):
            raise StopIteration
        else:
            item = self._cities[self._index]
            self._index += 1
            return item

Now, we have an **iterator** object, but we need to re-create it every time we want to start the iterations from the beginning:

In [None]:
cities = Cities()
list(enumerate(cities))

[(0, 'Paris'), (1, 'Berlin'), (2, 'Rome'), (3, 'Madrid'), (4, 'London')]

In [None]:
cities=Cities()
[item.upper() for item in cities]

['PARIS', 'BERLIN', 'ROME', 'MADRID', 'LONDON']

In [None]:
cities=Cities()
sorted(cities)

['Berlin', 'London', 'Madrid', 'Paris', 'Rome']

So, we basically have to "restart" an iterator by **creating a new one each time**.

But in this case, we are also re-creating the underlying data every time - seems wasteful!

Instead, maybe we can split the **iterator** part of our code from the **data** part of our code.

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)

And let's create our iterator this way:

In [None]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item

So now we can create our `Cities` instance **once**:

In [None]:
cities = Cities()

and create as many iterators as we want, but passing it the same `Cities` instance everyt time:

In [None]:
iter_1 = CityIterator(cities)

In [None]:
for city in iter_1:
    print(city)

New York
Newark
New Delhi
Newcastle


In [None]:
iter_2 = CityIterator(cities)
[city.upper() for city in iter_2]

['NEW YORK', 'NEWARK', 'NEW DELHI', 'NEWCASTLE']

So, we're almost at a solution now. At least we can create the **iterator** objects without having to recreate the `Cities` object every time.

But, we still have to remember to create a new iterator, **and** we can no longer iterate over the `cities` object anymore!

In [None]:
for city in cities:
    print(city)

TypeError: 'Cities' object is not iterable

This is where the first question we asked comes into play. Whenever we iterated our iterator, the first thing Python did was call `__iter__`.

In fact, let's just check that again:

In [None]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        print('Calling CityIterator __init__')
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        print('Calling CitiyIterator instance __iter__')
        return self
    
    def __next__(self):
        print('Calling __next__')
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item

In [None]:
iter_1 = CityIterator(cities)

Calling CityIterator __init__


In [None]:
for city in iter_1:
    print(city)

Calling CitiyIterator instance __iter__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


#### Iterables

Now we finally come to how an **iterable** is defined in Python.

An **iterable** is an object that:
* implements the `__iter__` method
* and that method returns an **iterator** which can be used to iterate over the object

What would happen if we put an `__iter__` method in the `Cities` object and then try to iterate?

When we try to iterate over the `Cities` instance, Python will first call `__iter__`. The `__iter__` method should then return an **iterator** which Python will use for the iteration.

We actually have everything we need to now make `Cities` an **iterable** since we already have the `CityIterator` created:

In [None]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        print('Calling CityIterator __init__')
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        print('Calling CitiyIterator instance __iter__')
        return self
    
    def __next__(self):
        print('Calling __next__')
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self):
        print('Calling Cities instance __iter__')
        return CityIterator(self)

In [None]:
cities = Cities()

In [None]:
for city in cities:
    print(city)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


And watch what happens if we try to run that loop again:

In [None]:
for city in cities:
    print(city)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


A new **iterator** was created when the `for` loop started.

In fact, same happens for anything that is going to iterate our iterable - it first calls the `__iter__` method of the itrable to get a **new** iterator, then uses the iterator to call `__next__`.

In [None]:
list(enumerate(cities))

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
Calling __next__
Calling __next__
Calling __next__
Calling __next__


[(0, 'New York'), (1, 'Newark'), (2, 'New Delhi'), (3, 'Newcastle')]

In [None]:
sorted(cities, reverse=True)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
Calling __next__
Calling __next__
Calling __next__
Calling __next__


['Newcastle', 'Newark', 'New York', 'New Delhi']

Now we can put the iterator class inside our `Cities` class to keep the code self-contained:

In [None]:
del CityIterator  # just to make sure CityIterator is not in our global scope

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self):
        print('Calling Cities instance __iter__')
        return self.CityIterator(self)
    
    class CityIterator:
        def __init__(self, city_obj):
            # cities is an instance of Cities
            print('Calling CityIterator __init__')
            self._city_obj = city_obj
            self._index = 0

        def __iter__(self):
            print('Calling CitiyIterator instance __iter__')
            return self

        def __next__(self):
            print('Calling __next__')
            if self._index >= len(self._city_obj):
                raise StopIteration
            else:
                item = self._city_obj._cities[self._index]
                self._index += 1
                return item

In [None]:
cities = Cities()

In [None]:
list(enumerate(cities))

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
Calling __next__
Calling __next__
Calling __next__
Calling __next__


[(0, 'New York'), (1, 'Newark'), (2, 'New Delhi'), (3, 'Newcastle')]

Technically we can even get an iterator instance ourselves directly, by calling `iter()` on the `cities` object:

In [None]:
iter_1 = iter(cities)
iter_2 = iter(cities)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling Cities instance __iter__
Calling CityIterator __init__


As you can see, Python created and returned two different instances of the `CityIterator` object.

In [None]:
id(iter_1), id(iter_2)

(1741231353928, 1741231354320)

And now we also have should understand why **iterators** also implement the `__iter__` method (that just returns themselves) - it makes them **iterables** too!

#### Mixing Iterables and Sequences

`Cities` is an iterable, but it is not a sequence type:

In [None]:
cities = Cities()

In [None]:
len(cities)

4

In [None]:
cities[1]

TypeError: 'Cities' object does not support indexing

Since our Cities **could** also be a sequence, we could also decide to implement the `__getitem__` method to make it into a sequence:

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)
    
    def __getitem__(self, s):
        print('getting item...')
        return self._cities[s]
    
    def __iter__(self):
        print('Calling Cities instance __iter__')
        return self.CityIterator(self)
    
    class CityIterator:
        def __init__(self, city_obj):
            # cities is an instance of Cities
            print('Calling CityIterator __init__')
            self._city_obj = city_obj
            self._index = 0

        def __iter__(self):
            print('Calling CitiyIterator instance __iter__')
            return self

        def __next__(self):
            print('Calling __next__')
            if self._index >= len(self._city_obj):
                raise StopIteration
            else:
                item = self._city_obj._cities[self._index]
                self._index += 1
                return item

In [None]:
cities = Cities()

It's a sequence:

In [None]:
cities[0]

getting item...


'New York'

It's also an iterable:

In [None]:
next(iter(cities))

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__


'New York'

Now that Cities is both a sequence type (`__getitem__`) and an iterable (`__iter__`), when we loop over `cities`, is Python going to use `__getitem__` or `__iter__`?

In [None]:
cities = Cities()
for city in cities:
    print(city)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


It uses the iterator - so Python will use the iterator if there is one, otherwise it will fall back to using `__getitem__`. If neither is implemented, we'll get an exception.

Of course, for selection by index or slice, the `__getitem__` method **must** be implemented.

We'll come back to this very topic in an upcoming video, because behind the scenes, even if we only implement the `__getitem__` method, Python will auto-generate an iterator for us!

### Python Built-In Iterables and Iterators

The way iterables and iterators work in our custom `Cities` example is exactly the way Python iterables work too.

In [None]:
l = [1, 2, 3]

Since lists are iterables, they implement the `__iter__` method and we can get an **iterator** for the list:

In [None]:
iter_l = iter(l)
#or could use iter_1 = l.__iter__()

In [None]:
type(iter_l)

list_iterator

In [None]:
next(iter_l)

1

In [None]:
next(iter_l)

2

In [None]:
next(iter_l)

3

In [None]:
next(iter_l)

StopIteration: 

See? The same `StopIteration` exception is raised.

Since `iter_l` is an iterator, it also implements the `__iter__` method, which just returns the iterator itself:

In [None]:
id(iter_l), id(iter(iter_l))

(1741231347248, 1741231347248)

In [None]:
'__next__' in dir(iter_l)

True

In [None]:
'__iter__' in dir(iter_l)

True

Since the list `l` is an iterable it also implements the `__iter__` method:

In [None]:
'__iter__' in dir(l)

True

but does not implement a `__next__` method:

In [None]:
'__next__' in dir(l)

False

Of course, since lists are also sequence types, they also implement the `__getitem__` method:

In [None]:
'__getitem__' in dir(l)

True

Sets and dictionaries on the other hand are not sequence types:

In [None]:
'__getitem__' in dir(set)

False

In [None]:
'__iter__' in dir(set)

True

In [None]:
s = {1, 2, 3}
'__next__' in dir(iter(s))

True

In [None]:
'__iter__' in dir(dict)

True

But what does the iterator for a dictionary actually return? It iterates over what? You shoudl probably already guess the answer to that one!

In [None]:
d = dict(a=1, b=2, c=3)

In [None]:
iter_d = iter(d)

In [None]:
next(iter_d)

'a'

Dictionary iterators will iterate over the **keys** of the dictionary.

To iterate over the values, we could use the `values()` method which returns an **iterable** over the values of the dictionary:

In [None]:
iter_vals = iter(d.values())

In [None]:
next(iter_vals)

1

And to iterate over both the keys and values, dictionaries provide an `items()` iterable:

In [None]:
iter_items = iter(d.items())

In [None]:
next(iter_items)

('a', 1)

Here we get an iterator over key, value tuples

We'll examine the usefullness of being able to iterate using `next` instead of a `for` loop, or comprehension, in the next video.

### Consuming Iterators Manually

We've already seen how to do this:

* get an iterator from the iterable
* call next on the iterator (until the `StopIteration` exception is raised)

Let's quickly see how do this again, using a string as the underlying iterable:

In [None]:
s = 'I sleep all night, and I work all day'

In [None]:
iter_s = iter(s)

In [None]:
print(next(iter_s))
print(next(iter_s))
print(next(iter_s))
print(next(iter_s))
print(next(iter_s))

I
 
s
l
e


This means we can get the next item in a collection without actually using a loop of any kind.

Why might this be useful?

#### Example 1

A fairly typical use case for this would be when reading data from a CSV file where you know the first few lines consist of information abotu teh data rather than just the data itself.

Let's try this using a CSV file I have saved alongside the Jupyter notebook.

Let's first load the data and see what it looks like:

In [None]:
with open('cars.csv') as file:
    for line in file:
        print(line)    

Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin

STRING;DOUBLE;INT;DOUBLE;DOUBLE;DOUBLE;DOUBLE;INT;CAT

Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US

Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US

Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US

AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US

Ford Torino;17.0;8;302.0;140.0;3449.;10.5;70;US

Ford Galaxie 500;15.0;8;429.0;198.0;4341.;10.0;70;US

Chevrolet Impala;14.0;8;454.0;220.0;4354.;9.0;70;US

Plymouth Fury iii;14.0;8;440.0;215.0;4312.;8.5;70;US

Pontiac Catalina;14.0;8;455.0;225.0;4425.;10.0;70;US

AMC Ambassador DPL;15.0;8;390.0;190.0;3850.;8.5;70;US

Citroen DS-21 Pallas;0;4;133.0;115.0;3090.;17.5;70;Europe

Chevrolet Chevelle Concours (sw);0;8;350.0;165.0;4142.;11.5;70;US

Ford Torino (sw);0;8;351.0;153.0;4034.;11.0;70;US

Plymouth Satellite (sw);0;8;383.0;175.0;4166.;10.5;70;US

AMC Rebel SST (sw);0;8;360.0;175.0;3850.;11.0;70;US

Dodge Challenger SE;15.0;8;383.0;170.

As we can see, the values are delimited by `;` and the first two lines consist of the column names, and column types.

The reason for the spacing between each line is that each line ends with a newline, and our print statement also emits a newline by default. So we'll have to strip those out.

Here's what we want to do: 
* read the first line to get the column headers and create a named tuple class
* read data types from second line and store this so we can cast the strings we are reading to the correct data type
* read the data rows and parse them into a named tuples

We could do it this way:

In [None]:
with open('cars.csv') as file:
    row_index = 0
    for line in file:
        if row_index == 0:
            # header row
            headers = line.strip('\n').split(';')
            print(headers)
        elif row_index == 1:
            # data type row
            data_types = line.strip('\n').split(';')
            print(data_types)
        else:
            # data rows
            data = line.strip('\n').split(';')
            print(data)
        row_index += 1

['Car', 'MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight', 'Acceleration', 'Model', 'Origin']
['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT']
['Chevrolet Chevelle Malibu', '18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', 'US']
['Buick Skylark 320', '15.0', '8', '350.0', '165.0', '3693.', '11.5', '70', 'US']
['Plymouth Satellite', '18.0', '8', '318.0', '150.0', '3436.', '11.0', '70', 'US']
['AMC Rebel SST', '16.0', '8', '304.0', '150.0', '3433.', '12.0', '70', 'US']
['Ford Torino', '17.0', '8', '302.0', '140.0', '3449.', '10.5', '70', 'US']
['Ford Galaxie 500', '15.0', '8', '429.0', '198.0', '4341.', '10.0', '70', 'US']
['Chevrolet Impala', '14.0', '8', '454.0', '220.0', '4354.', '9.0', '70', 'US']
['Plymouth Fury iii', '14.0', '8', '440.0', '215.0', '4312.', '8.5', '70', 'US']
['Pontiac Catalina', '14.0', '8', '455.0', '225.0', '4425.', '10.0', '70', 'US']
['AMC Ambassador DPL', '15.0', '8', '390.0', '190.0', '3850.', '8.5', '70', 'US']
[

In [None]:
from collections import namedtuple
cars = []

with open('cars.csv') as file:
    row_index = 0
    for line in file:
        if row_index == 0:
            # header row
            headers = line.strip('\n').split(';')
            Car = namedtuple('Car', headers)
        elif row_index == 1:
            # data type row
            data_types = line.strip('\n').split(';')
            print(data_types)
        else:
            # data rows
            data = line.strip('\n').split(';')
            car = Car(*data)
            cars.append(car)
        row_index += 1

['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT']


In [None]:
print(cars[0])

Car(Car='Chevrolet Chevelle Malibu', MPG='18.0', Cylinders='8', Displacement='307.0', Horsepower='130.0', Weight='3504.', Acceleration='12.0', Model='70', Origin='US')


We still need to parse the data into strings, integers, floats...

Let's break this problem down into smaller chunks:

First we need to figure cast to a data type based on the data type string:
* STRING --> `str`
* DOUBLE --> `float`
* INT --> `int`
* CAT --> `str`

In [None]:
def cast(data_type, value):
    if data_type == 'DOUBLE':
        return float(value)
    elif data_type == 'INT':
        return int(value)
    else:
        return str(value)

Next we somehow have to cast all the items in a list, based on their corresponding data type in the data_types array:

In [None]:
data_types = ['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT']

In [None]:
data_row = ['Chevrolet Chevelle Malibu', '18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', 'US']

For something like this, we can just zip up the two lists:

In [None]:
list(zip(data_types, data_row))

[('STRING', 'Chevrolet Chevelle Malibu'),
 ('DOUBLE', '18.0'),
 ('INT', '8'),
 ('DOUBLE', '307.0'),
 ('DOUBLE', '130.0'),
 ('DOUBLE', '3504.'),
 ('DOUBLE', '12.0'),
 ('INT', '70'),
 ('CAT', 'US')]

And we can either use a `map()` or a list comprehension to apply the cast function to each one:

In [None]:
[cast(data_type, value) for data_type, value in zip(data_types, data_row)]

['Chevrolet Chevelle Malibu', 18.0, 8, 307.0, 130.0, 3504.0, 12.0, 70, 'US']

So now we can write this in a function:

In [None]:
def cast_row(data_types, data_row):
    return [cast(data_type, value) 
            for data_type, value in zip(data_types, data_row)]

Let's go back and fix up our original code now:

In [None]:
from collections import namedtuple
cars = []

with open('cars.csv') as file:
    row_index = 0
    for line in file:
        if row_index == 0:
            # header row
            headers = line.strip('\n').split(';')
            Car = namedtuple('Car', headers)
        elif row_index == 1:
            # data type row
            data_types = line.strip('\n').split(';')
        else:
            # data rows
            data = line.strip('\n').split(';')
            data = cast_row(data_types, data)
            car = Car(*data)
            cars.append(car)
        row_index += 1

In [None]:
cars[0]

Car(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US')

Now let's see if we can clean up this code by using iterators directly:

In [None]:
from collections import namedtuple
cars = []

with open('cars.csv') as file:
    file_iter = iter(file)
    headers = next(file_iter).strip('\n').split(';')
    Car = namedtuple('Car', headers)
    data_types = next(file_iter).strip('\n').split(';')
    for line in file_iter:
        data = line.strip('\n').split(';')
        data = cast_row(data_types, data)
        car = Car(*data)
        cars.append(car)

In [None]:
cars[0]

Car(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US')

That's already quite a bit cleaner... But why stop there!

In [None]:
from collections import namedtuple

with open('cars.csv') as file:
    file_iter = iter(file)
    headers = next(file_iter).strip('\n').split(';')
    data_types = next(file_iter).strip('\n').split(';')
    cars_data = [cast_row(data_types, 
                          line.strip('\n').split(';'))
                   for line in file_iter]
    cars = [Car(*item) for item in cars_data]

In [None]:
cars_data[0]

['Chevrolet Chevelle Malibu', 18.0, 8, 307.0, 130.0, 3504.0, 12.0, 70, 'US']

In [None]:
cars[0]

Car(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US')

I chose to split creating the parsed cars_data and the named tuple list into two steps for readability - but we could combine them into a single step:

In [None]:
from collections import namedtuple

with open('cars.csv') as file:
    file_iter = iter(file)
    headers = next(file_iter).strip('\n').split(';')
    data_types = next(file_iter).strip('\n').split(';')
    cars = [Car(*cast_row(data_types, 
                          line.strip('\n').split(';')))
            for line in file_iter]


In [None]:
cars[0]

Car(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US')

### Cyclic Iterators

Iterables do not have to be finite. In fact we can easily create an infinite cyclical iterator.

Here's an example - suppose we have a loop that iterates over some range of integers. As we loop through those integers we want to create a tuple containing the integer and a string that cycles over a finite set (smaller than the list of integers).

```
1, 2, 3, 4, 5, 6, 7, 8, 9, ...

N, S, W, E
```

and we want to generate

```
1N, 2S, 3W, 4E, 5N, 6S, 7W, 8E, 9N, ...
```


We could do it this way by creating a custom iterator for the list `['N', 'S', 'W', 'E']` that will cycle over that list indefinitely:

In [None]:
class CyclicIterator:
    def __init__(self, lst):
        self.lst = lst
        self.i = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        result = self.lst[self.i % len(self.lst)]
        self.i += 1
        return result

In [None]:
iter_cycl = CyclicIterator('NSWE')

In [None]:
for i in range(10):
    print(next(iter_cycl))

N
S
W
E
N
S
W
E
N
S


So, now we can tackle our original problem:

In [None]:
n = 10
iter_cycl = CyclicIterator('NSWE')
for i in range(1, n+1):
    direction = next(iter_cycl)
    print(f'{i}{direction}')

1N
2S
3W
4E
5N
6S
7W
8E
9N
10S


And re-working this into a list comprehension:

In [None]:
n = 10
iter_cycl = CyclicIterator('NSWE')
[f'{i}{next(iter_cycl)}' for i in range(1, n+1)]

['1N', '2S', '3W', '4E', '5N', '6S', '7W', '8E', '9N', '10S']

Of course, there's an easy alternative way to do this as well, using:
* repetition
* zip
* a list comprehension

We need to repeat the array ['N', 'S', 'W', 'E'] for as many times as we have elements in our range of integers - we can even create way more than we need - because when we `zip` it up with the range of integers, the smallest length iterable will be used:

In [None]:
n = 10
list(zip(range(1, n+1), 'NSWE' * (n//4 + 1)))

[(1, 'N'),
 (2, 'S'),
 (3, 'W'),
 (4, 'E'),
 (5, 'N'),
 (6, 'S'),
 (7, 'W'),
 (8, 'E'),
 (9, 'N'),
 (10, 'S')]

In [None]:
[f'{i}{direction}'
 for i, direction in zip(range(1, n+1), 'NSWE' * (n//4 + 1))]

['1N', '2S', '3W', '4E', '5N', '6S', '7W', '8E', '9N', '10S']

There's actually an even easier way yet, and that's to use our `CyclicIterator`, but instead of building it ourselves, we can simply use the one provided by Python in the standard library!!

In [None]:
import itertools

In [None]:
n = 10
iter_cycl = CyclicIterator('NSWE')
[f'{i}{next(iter_cycl)}' for i in range(1, n+1)]

['1N', '2S', '3W', '4E', '5N', '6S', '7W', '8E', '9N', '10S']

and using itertools:

In [None]:
n = 10
iter_cycl = itertools.cycle('NSWE')
[f'{i}{next(iter_cycl)}' for i in range(1, n+1)]

['1N', '2S', '3W', '4E', '5N', '6S', '7W', '8E', '9N', '10S']

### Lazy Iterables

An iterable is an object that can return an iterator (`__iter__`).

In turn an iterator is an object that can return itself (`__iter__`), and return the next value when asked (`__next__`).

Nothing in all this says that the iterable needs to be a finite collection, or that the elements in the iterable need to be materialized (pre-created) at the time the iterable / iterator is created.

Lazy evaluation is when evaluating a value is deferred until it is actually requested.

It is not specific to iterables however.

Simple examples of lazy evaluation are often seen in classes for calculated properties.

Let's look at an example of a lazy class property:

In [None]:
import math

class Circle:
    def __init__(self, r):
        self.radius = r
        
    @property
    def radius(self):
        return self._radius
    
    @radius.setter
    def radius(self, r):
        self._radius = r
        self.area = math.pi * r**2

As you can see, in this circle class, every time we set the radius, we re-calculate and store the area. When we request the area of the circle, we simply return the stored value.

In [None]:
c = Circle(1)

In [None]:
c.area

3.141592653589793

In [None]:
c.radius = 2

In [None]:
c.radius, c.area

(2, 12.566370614359172)

But instead of doing it this way, we could just calculate the area every time it is requested without actually storing the value:

In [None]:
class Circle:
    def __init__(self, r):
        self.radius = r
        
    @property
    def radius(self):
        return self._radius
    
    @radius.setter
    def radius(self, r):
        self._radius = r

    @property
    def area(self):
        return math.pi * self.radius ** 2

In [None]:
c = Circle(1)

In [None]:
c.area

3.141592653589793

In [None]:
c.radius = 2

In [None]:
c.area

12.566370614359172

But the area is always recalculated, so we may take a hybrid approach where we want to store the area so we don't need to recalculate it every time (ecept when the radius is modified), but delay calculating the area until it is requested - that way if it is never requested, we didn't waste the CPU cycles to calculate it, or the memory to store it.

In [None]:
class Circle:
    def __init__(self, r):
        self.radius = r
        
    @property
    def radius(self):
        return self._radius
    
    @radius.setter
    def radius(self, r):
        self._radius = r
        self._area = None

    @property
    def area(self):
        if self._area is None:
            print('Calculating area...')
            self._area = math.pi * self.radius ** 2
        return self._area

In [None]:
c = Circle(1)

In [None]:
c.area

Calculating area...


3.141592653589793

In [None]:
c.area

3.141592653589793

In [None]:
c.radius = 2

In [None]:
c.area

Calculating area...


12.566370614359172

This is an example of lazy evaluation. We don't actually calculate and store an attribute of the class until it is actually needed.

We can sometimes do something similar with iterables - we don't actually have to store every item of the collection - we may be able to just calculate the item as needed.

In the following example we'll create an iterable of factorials of integers starting at `0`, i.e.

`0!, 1!, 2!, 3!, ..., n!`

In [None]:
class Factorials:
    def __init__(self, length):
        self.length = length
    
    def __iter__(self):
        return self.FactIter(self.length)
    
    class FactIter:
        def __init__(self, length):
            self.length = length
            self.i = 0
            
        def __iter__(self):
            return self
        
        def __next__(self):
            if self.i >= self.length:
                raise StopIteration
            else:
                result = math.factorial(self.i)
                self.i += 1
                return result
            

In [None]:
facts = Factorials(5)

In [None]:
list(facts)

[1, 1, 2, 6, 24]

So as you can see, we do not store the values of the iterable, instead we just calculate the items as needed.

In fact, now that we have this iterable, we don't even need it to be finite:

In [None]:
class Factorials:
    def __iter__(self):
        return self.FactIter()
    
    class FactIter:
        def __init__(self):
            self.i = 0
            
        def __iter__(self):
            return self
        
        def __next__(self):
            result = math.factorial(self.i)
            self.i += 1
            return result

In [None]:
factorials = Factorials()
fact_iter = iter(factorials)

for _ in range(10):
    print(next(fact_iter))

1
1
2
6
24
120
720
5040
40320
362880


You'll notice that the main part of the iterable code is in the iterator, and the iterable itself is nothing more than a thin shell that allows us to create and access the iterator. This is so common, that there is a better way of doing this that we'll see when we deal with generators.

### Python's Built-In Iterables and Iterators

Python has a lot of built-in functions that return iterators or iterables.

Let's look at the simple `range` function first:

In [None]:
r_10 = range(10)

Now, `r_10` is an **iterable**:

In [None]:
'__iter__' in dir(r_10)

True

But it is not an **iterator**:

In [None]:
'__next__' in dir(r_10)

False

However, we can request an iterator by calling the `__iter__` method, or simply using the `iter()` function:

In [None]:
r_10_iter = iter(r_10)

And of course this is now an iterator:

In [None]:
'__iter__' in dir(r_10_iter)

True

In [None]:
'__next__' in dir(r_10_iter)

True

Most built-in iterables in Python use lazy evaluation (including the `range`) function - i.e. when we execute `range(10)` Python does no pre-compute a "list" of all the elements in the range. Instead it uses lazy evluation and the iterator computes and returns elements one at a time.

This is why when we print a range object we do not actually see the contents of the range - they don't exist yet!

Instead, we need to iterate through the iterator and put it into something like a list:

In [None]:
[num for num in range(10)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The `zip` function on the other hand returns an iterator:

In [None]:
z = zip([1, 2, 3], 'abc')

In [None]:
z

<zip at 0x28b01b684c8>

It is an **iterator**:

In [None]:
print('__iter__' in dir(z))
print('__next__' in dir(z))

True
True


Just like `range()` though, it also uses lazy evaluation, so we need to iterate through the iterator and make a list for example in order to see the contents:

In [None]:
list(z)

[(1, 'a'), (2, 'b'), (3, 'c')]

Even reading a file line by line is done using lazy evaluation:

In [None]:
with open('cars.csv') as f:
    print(type(f))
    print('__iter__' in dir(f))
    print('__next__' in dir(f))

<class '_io.TextIOWrapper'>
True
True


As you can see, the `open()` function returns an **iterator** (of type `TextIOWrapper`), and we can read lines from the file one by one using the `next()` function, or calling the `__next__()` method. The class also implements a `readline()` method we can use to get the next row:

In [None]:
with open('cars.csv') as f:
    print(next(f))
    print(f.__next__())
    print(f.readline())

Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin

STRING;DOUBLE;INT;DOUBLE;DOUBLE;DOUBLE;DOUBLE;INT;CAT

Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US



Of course we can just iterate over all the lines using a `for` loop as well:

In [None]:
with open('cars.csv') as f:
    for row in f:
        print(row, end='')

Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
STRING;DOUBLE;INT;DOUBLE;DOUBLE;DOUBLE;DOUBLE;INT;CAT
Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US
Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US
Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US
AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US
Ford Torino;17.0;8;302.0;140.0;3449.;10.5;70;US
Ford Galaxie 500;15.0;8;429.0;198.0;4341.;10.0;70;US
Chevrolet Impala;14.0;8;454.0;220.0;4354.;9.0;70;US
Plymouth Fury iii;14.0;8;440.0;215.0;4312.;8.5;70;US
Pontiac Catalina;14.0;8;455.0;225.0;4425.;10.0;70;US
AMC Ambassador DPL;15.0;8;390.0;190.0;3850.;8.5;70;US
Citroen DS-21 Pallas;0;4;133.0;115.0;3090.;17.5;70;Europe
Chevrolet Chevelle Concours (sw);0;8;350.0;165.0;4142.;11.5;70;US
Ford Torino (sw);0;8;351.0;153.0;4034.;11.0;70;US
Plymouth Satellite (sw);0;8;383.0;175.0;4166.;10.5;70;US
AMC Rebel SST (sw);0;8;360.0;175.0;3850.;11.0;70;US
Dodge Challenger SE;15.0;8;383.0;170.0;3563.;10.0;70;U

The `TextIOWrapper` class also provides a method `readlines()` that will read the entire file and return a list containing all the rows:

In [None]:
with open('cars.csv') as f:
    l = f.readlines()

In [None]:
l

['Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin\n',
 'STRING;DOUBLE;INT;DOUBLE;DOUBLE;DOUBLE;DOUBLE;INT;CAT\n',
 'Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US\n',
 'Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US\n',
 'Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US\n',
 'AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US\n',
 'Ford Torino;17.0;8;302.0;140.0;3449.;10.5;70;US\n',
 'Ford Galaxie 500;15.0;8;429.0;198.0;4341.;10.0;70;US\n',
 'Chevrolet Impala;14.0;8;454.0;220.0;4354.;9.0;70;US\n',
 'Plymouth Fury iii;14.0;8;440.0;215.0;4312.;8.5;70;US\n',
 'Pontiac Catalina;14.0;8;455.0;225.0;4425.;10.0;70;US\n',
 'AMC Ambassador DPL;15.0;8;390.0;190.0;3850.;8.5;70;US\n',
 'Citroen DS-21 Pallas;0;4;133.0;115.0;3090.;17.5;70;Europe\n',
 'Chevrolet Chevelle Concours (sw);0;8;350.0;165.0;4142.;11.5;70;US\n',
 'Ford Torino (sw);0;8;351.0;153.0;4034.;11.0;70;US\n',
 'Plymouth Satellite (sw);0;8;383.0;175.0;4166.;10.5;70;US\n',
 'AMC Rebe

So you might be wondering which method to use? Use the `readlines()` method, or use the iterator methods?

Especially if you ending up reading the entire file - would one method be better than the other?

Consider this example, where we want to find out all the different origins in the file (last column of each row) - let's do this using both approaches.

In [None]:
origins = set()
with open('cars.csv') as f:
    rows = f.readlines()
for row in rows[2:]:
    origin = row.strip('\n').split(';')[-1]
    origins.add(origin)
print(origins)

{'Japan', 'Europe', 'US'}


In [None]:
origins = set()
with open('cars.csv') as f:
    next(f), next(f)
    for row in f:
        origin = row.strip('\n').split(';')[-1]
        origins.add(origin)
print(origins)

{'Japan', 'Europe', 'US'}


Now consider the first approach: we loaded the **entire** file into memory (a list), and then iterated through all the rows.

But in the second approach, we still iterated through all the rows, but we only need to store **one row** at a time - the overhead was therefore far smaller.

Often we can process files one row at a time and loading the entire file first, especially for huge files, is not always desirable.

The `enumerate` function is another lazy iterator:

In [None]:
e = enumerate('Python rocks!')

In [None]:
print('__iter__' in dir(e))
print('__next__' in dir(e))

True
True


In [None]:
iter(e)

<enumerate at 0x1d75df12fc0>

In [None]:
e

<enumerate at 0x1d75df12fc0>

As we can see, the object and its iterator are the same object.

But `enumerate` is also lazy, so we need to iterate through it in order to recover all the elements:

In [None]:
list(e)

[(0, 'P'),
 (1, 'y'),
 (2, 't'),
 (3, 'h'),
 (4, 'o'),
 (5, 'n'),
 (6, ' '),
 (7, 'r'),
 (8, 'o'),
 (9, 'c'),
 (10, 'k'),
 (11, 's'),
 (12, '!')]

Of course, once we have exhausted the iterator, we cannot use it again:

In [None]:
list(e)

[]

The dictionary object provides methods that return iterables for the keys, values or tuples of key/value pairs:

In [None]:
d = {'a': 1, 'b': 2}

In [None]:
keys = d.keys()

In [None]:
'__iter__' in dir(keys), '__next__' in dir(keys)

(True, False)

More simply, we can just test to see if `iter(keys)` **is** the same object as `keys` - if not then we are dealing with an iterable.

In [None]:
iter(keys) is keys

False

So we have an iterable.

Similarly for `.values()` and `.items()`:

In [None]:
values = d.values()
iter(values) is values

False

In [None]:
items = d.items()
iter(items) is items

False

There are many other such functions and methods in Python, and we'll cover more of them in some upcoming videos

Just be careful and know whether you are dealing with an iterable or an iterator. You can iterate and iterable over and over again, but can only do so once with an iterator.

### Sorting Iterables

There's nothing really new here - we have seen the `sorted()` function before when we looked at sorting sequences.

The `sorted()` function will in fact work with any iterable, not just sequences.

Let's try this by creating a custom iterable and then sorting it.

For this example, we'll create an iterable of random numbers, and then sort it.

In [None]:
import random

In [None]:
random.seed(0)

In [None]:
for i in range(10):
    print(random.randint(1, 10))

10
4
9
3
5
3
2
10
5
9


In [None]:
import random

class RandomInts:
    def __init__(self, length, *, seed=0, lower=0, upper=10):
        self.length = length
        self.seed = seed
        self.lower = lower
        self.upper = upper
        
    def __len__(self):
        return self.length
    
    def __iter__(self):
        return self.RandomIterator(self.length, 
                                   seed = self.seed, 
                                   lower = self.lower,
                                   upper=self.upper)
    
    
    class RandomIterator:
        def __init__(self, length, *, seed, lower, upper):
            self.length = length
            self.lower = lower
            self.upper = upper
            self.num_requests = 0
            random.seed(seed)
            
        def __iter__(self):
            return self
        
        def __next__(self):
            if self.num_requests >= self.length:
                raise StopIteration
            else:
                result = random.randint(self.lower, self.upper)
                self.num_requests += 1
                return result

In [None]:
randoms = RandomInts(10)

In [None]:
for num in randoms:
    print(num)

6
6
0
4
8
7
6
4
7
5


We can now sort our iterable using the `sorted()` method:

In [None]:
sorted(randoms)

[0, 4, 4, 5, 6, 6, 6, 7, 7, 8]

In [None]:
sorted(randoms, reverse=True)

[8, 7, 7, 6, 6, 6, 5, 4, 4, 0]

### The `iter()` Function

As we have seen before, the `iter()` function is used to request an iterator object from an iterable.

For example:

In [None]:
l = [1, 2, 3, 4]

In [None]:
l_iter = iter(l)

In [None]:
type(l_iter)

list_iterator

And we can use that iterator to iterate the collection by calling `next()` until a `StopIteration` exception is raised.

In [None]:
next(l_iter)

1

In [None]:
next(l_iter)

2

We also saw how sequence types are also iterable even though they are not actual iterables - they do not have an `__iter__` method, but instead they have a `__getitem__` method.

Python had no problem iterating a sequence object - in fact behind the scenes an iterator is built by Python in order to iterate using the `__getitem__` method:

In [None]:
class Squares:
    def __init__(self, n):
        self._n = n
    
    def __len__(self):
        return self._n
    
    def __getitem__(self, i):
        if i >= self._n:
            raise IndexError
        else:
            return i ** 2

In [None]:
sq = Squares(5)

In [None]:
for i in sq:
    print(i)

0
1
4
9
16


But, we can also do this:

In [None]:
sq_iter = iter(sq)

And we now have an iterator for `sq`!

In [None]:
type(sq_iter)

iterator

In [None]:
'__next__' in dir(sq_iter)

True

What happens is that Python will first try to get the iterator by invoking the `__iter__` method on our object.

If it does not have that method, it will look for `__getitem__` next - if it's there it will create an iterator for us that will leverage `__getitem__` and the fact that sequence indices should start at 0.

If neither `__iter__` nor `__getitem__` are found, then we'll get an exception such as this one:

In [None]:
for i in 10:
    print(i)

TypeError: 'int' object is not iterable

Here's how we might build an iterator using the `__getitem__` method ourselves - not that we have to do that since Python does it for us.

In [None]:
class Squares:
    def __init__(self, n):
        self._n = n
    
    def __len__(self):
        return self._n
    
    def __getitem__(self, i):
        if i >= self._n:
            raise IndexError
        else:
            return i ** 2

In [None]:
class SquaresIterator:
    def __init__(self, squares):
        self._squares = squares
        self._i = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._i >= len(self._squares):
            raise StopIteration
        else:
            result = self._squares[self._i]
            self._i += 1
            return result

In [None]:
sq = Squares(5)
sq_iterator = SquaresIterator(sq)

In [None]:
type(sq_iterator)

__main__.SquaresIterator

In [None]:
print(next(sq_iterator))
print(next(sq_iterator))
print(next(sq_iterator))
print(next(sq_iterator))
print(next(sq_iterator))

0
1
4
9
16


The iterator is now exhausted, so:

In [None]:
print(next(sq_iterator))

StopIteration: 

Technically, we don't actually need to implement the `__len__` method in our sequence type, but since we are using it in our iterator, we'll have to think of something else - we can leverage the fact that the sequence will raise an IndexError if the index is out of bounds:

In [None]:
class SquaresIterator:
    def __init__(self, squares):
        self._squares = squares
        self._i = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        try:
            result = self._squares[self._i]
            self._i += 1
            return result
        except IndexError:
            raise StopIteration()

And things will work as before:

In [None]:
sq_iterator = SquaresIterator(sq)

In [None]:
for i in sq_iterator:
    print(i)

0
1
4
9
16


#### How to test if an object is iterable

Basically an object is iterable if it:
* implements the **iterable** protocol (`__iter__` that returns an iterator)
* implements the **sequence** protocol (`__getitem__`, and `__len__`) - although `__len__` is not required for iteration


Given some object, how can we test to see if it is iterable or not?

The problem is that we would need to test for both `__iter__` (making sure it returns an iterator), and `__getitem__`. Far easier to do a try/except.

For example, just testing that `__iter__` is defined is not sufficient:

In [None]:
class SimpleIter:
    def __init__(self):
        pass
    
    def __iter__(self):
        return 'Nope'

In [None]:
s = SimpleIter()

In [None]:
'__iter__' in dir(s)

True

However, if we call `iter()` on `SimpleIter`, look at what happens:

In [None]:
iter(s)

TypeError: iter() returned non-iterator of type 'str'

So the best way, if you have some need to detect if something is iterable or not, is the following:

In [None]:
def is_iterable(obj):
    try:
        iter(obj)
        return True
    except TypeError:
        return False

In [None]:
is_iterable(SimpleIter())

False

In [None]:
is_iterable(Squares(5))

True

That said, we'll cover exception handling in Python later in this course, but there is rarely a need to test if something is iterable, only to then go ahead and iterate over it right after that if it is.

Consider the following two alternatives:

In [None]:
obj = 100
if is_iterable(obj):
    for i in obj:
        print(i)
else:
    print('Error: obj is not iterable')

Error: obj is not iterable


vs

In [None]:
obj = 100
for i in obj:
    print(i)

TypeError: 'int' object is not iterable

As you can see, the error Python itself raises tells us the same thing, and provides even more information!!

Instead of guarding for potential errors as we did in the first example, try doing the action you really want to do, and let Python raise the exception for you.

If you want to handle the exception, wrap you action inside a try/except:

So instead of writing it this way (*ask before you leap*):

In [None]:
obj = 100
if is_iterable(obj):
    for i in obj:
        print(i)
else:
    print('Error: obj is not iterable')
    print('Taking some action as a consequence of this error')

Error: obj is not iterable
Taking some action as a consequence of this error


prefer writing it this way (*ask for forgiveness later*):

In [None]:
obj = 100
try:
    for i in obj:
        print(i)
except TypeError:
    print('Error: obj is not iterable')
    print('Taking some action as a consequence of this error')

Error: obj is not iterable
Taking some action as a consequence of this error


This approach to exception handling we'll cover in a lot more detail later, but boils down to the simple idea:

*"It's easier to ask forgiveness than it is to get permission"*

(commonly attributed to Grace Hopper)

### Iterating Callables

We can easily create iterators that are based on callables in general.

Let's look at an example:

##### Example 1

In this example we are going to create a counter function (using a closure) - it's a pretty simplistic function - `counter()` will return a closure that we can then call to increment an internal counter by `1` every time it is called:

In [None]:
def counter():
    i = 0
    
    def inc():
        nonlocal i
        i += 1
        return i
    return inc

This function allows us to create a simple counter, which we can use as follows:

In [None]:
cnt = counter()

In [None]:
cnt()

1

In [None]:
cnt()

2

Technically we can make an iterator to iterate over this counter:

In [None]:
class CounterIterator:
    def __init__(self, counter_callable):
        self.counter_callable = counter_callable
        
    def __iter__(self):
        return self
    
    def __next__(self):
        return self.counter_callable()

Do note that this is an **infinite** iterable!

In [None]:
cnt = counter()
cnt_iter = CounterIterator(cnt)
for _ in range(5):
    print(next(cnt_iter))

1
2
3
4
5


So basically we were able to create an **iterator** from some arbitrary callable.

But one issue is that we have an **inifinite** iterable.

One way around this issue, would be to specify a "stop" value when the iterator should decide to end the iteration.

Let's see how we would do this:

In [None]:
class CounterIterator:
    def __init__(self, counter_callable, sentinel):
        self.counter_callable = counter_callable
        self.sentinel = sentinel
        
    def __iter__(self):
        return self
    
    def __next__(self):
        result = self.counter_callable()
        if result == self.sentinel:
            raise StopIteration
        else:
            return result

Now we can essentially provide a value that if returned from the callable will result in a `StopIteration` exception, essentially terminating the iteration:

In [None]:
cnt = counter()
cnt_iter = CounterIterator(cnt, 5)
for c in cnt_iter:
    print(c)

1
2
3
4


Now there is technically an issue here: the cnt_iter is still "alive" - our iterator raised a `StopIteration` exception, but if we call it again, it will happily resume from where it left off!

In [None]:
next(cnt_iter)

6

We really should make sure the iterator has been consumed, so let's fix that:

In [None]:
class CounterIterator:
    def __init__(self, counter_callable, sentinel):
        self.counter_callable = counter_callable
        self.sentinel = sentinel
        self.is_consumed = False
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.is_consumed:
            raise StopIteration
        else:
            result = self.counter_callable()
            if result == self.sentinel:
                self.is_consumed = True
                raise StopIteration
            else:
                return result

Now it should behave as a normal iterator that cannot continue iterating once the first `StopIteration` exception has been raised:

In [None]:
cnt = counter()
cnt_iter = CounterIterator(cnt, 5)
for c in cnt_iter:
    print(c)

1
2
3
4


In [None]:
next(cnt_iter)

StopIteration: 

As we just saw, we can essentially make an iterator based on any callable, and our `CounterIterator` was actually quite generic, it only needed a callable and a sentinel value to work.

In fact, that's exactly what the second form of the `iter()` function allows us to do!

Let's see the help on `iter`:

In [None]:
help(iter)

Help on built-in function iter in module builtins:

iter(...)
    iter(iterable) -> iterator
    iter(callable, sentinel) -> iterator
    
    Get an iterator from an object.  In the first form, the argument must
    supply its own iterator, or be a sequence.
    In the second form, the callable is called until it returns the sentinel.



As we can see `iter` has a second form, that takes in a callable and a sentinel value.

And it will result in exactly what we have been doing, but without having to create the iterator class ourselves!

In [None]:
cnt = counter()
cnt_iter = iter(cnt, 5)
for c in cnt_iter:
    print(c)

1
2
3
4


In [None]:
next(cnt_iter)

StopIteration: 

##### Example 2

Both of these approaches can be made to work with any callable.

For example, you may want to iterater through random numbers until a specific random number is generated:

In [None]:
import random

In [None]:
random.seed(0)
for i in range(10):
    print(i, random.randint(0, 10))

0 6
1 6
2 0
3 4
4 8
5 7
6 6
7 4
8 7
9 5


As you can see in this example (I set my seed to 0 to have repeatable results), the number `8` is reached at the `5`th iteration.

(I am just doing this to find an easy sentinel value so we can easily verify that our code is working properly)

In [None]:
random_iterator = iter(lambda : random.randint(0, 10), 8)

In [None]:
random.seed(0)

for num in random_iterator:
    print(num)

6
6
0
4


Neat!

##### Example 3

Let's try a countdown example like the one we discussed in the lecture.

We'll use a closure to get our countdown working:

In [None]:
def countdown(start=10):
    def run():
        nonlocal start
        start -= 1
        return start
    return run

In [None]:
takeoff = countdown(10)
for _ in range(15):
    print(takeoff())

9
8
7
6
5
4
3
2
1
0
-1
-2
-3
-4
-5


So the countdown function works, but we would like to be able to iterate over it and stop the iteration once we reach 0.

In [None]:
takeoff  = countdown(10)
takeoff_iter = iter(takeoff, -1)

In [None]:
for val in takeoff_iter:
    print(val)

9
8
7
6
5
4
3
2
1
0


### Delegating Iterators

Often we write classes that use some existing iterable for the data contained in our class. By default, that class is not iterable, and we would need to implement an iterator for our class and implement the `__iter__` method in our class to return new instances of that iterator.

But, if our underlying data structure for our class is already an iterable, there's a much quicker way of doing it - delegation.

We'll start with a really simple example first:

In [None]:
from collections import namedtuple

Person = namedtuple('Person', 'first last')

In [None]:
class PersonNames:
    def __init__(self, persons):
        try:
            self._persons = [person.first.capitalize()
                             + ' ' + person.last.capitalize()
                            for person in persons]
        except (TypeError, AttributeError):
            self._persons = []

In [None]:
persons = [Person('michaeL', 'paLin'), Person('eric', 'idLe'), 
           Person('john', 'cLeese')]

In [None]:
person_names = PersonNames(persons)

Technically we can see the underlying data by accessing the (pseudo) private variable `_persons`.

In [None]:
person_names._persons

['Michael Palin', 'Eric Idle', 'John Cleese']

But we really would prefer making our `PersonNames` instances iterable.

To do so we need to implement the `__iter__` method that returns an iterator that can be used for iterating over the `_persons` list.

But lists are iterables, so they can provide an iterator, and that's precisely what we'll do - we'll **delegate** our own iterator, to the list's iterator:

In [None]:
class PersonNames:
    def __init__(self, persons):
        try:
            self._persons = [person.first.capitalize()
                             + ' ' + person.last.capitalize()
                            for person in persons]
        except TypeError:
            self._persons = []
    
    def __iter__(self):
        return iter(self._persons)

And now, `PersonNames` is iterable!

In [None]:
persons = [Person('michaeL', 'paLin'), Person('eric', 'idLe'), 
           Person('john', 'cLeese')]
person_names = PersonNames(persons)

In [None]:
for p in person_names:
    print(p)

Michael Palin
Eric Idle
John Cleese


And of course we can sort, use list comprehensions, and so on - our PersonNames **is** an iterable.

Here we sort the names based on the full name, then split the names (on the space) and return a tuple of first name, last name:

In [None]:
[tuple(person_name.split()) for person_name in sorted(person_names)]

[('Eric', 'Idle'), ('John', 'Cleese'), ('Michael', 'Palin')]

Or, if we want to sort based on the last name:

In [None]:
sorted(person_names, key=lambda x: x.split()[1])

['John Cleese', 'Eric Idle', 'Michael Palin']

### Reversed Iteration

Sometimes we may want to iterate through an iterable but in **reverse** order.

Of course, this means the collection being iterated must be finite.

Python has a built-in function called `reversed()` to do this that will work with any type that implement the sequence protocol. But for iterables in general it's a little more complicated.

Let's first build a custom iterable.

For this example we are going to build a custom iterable that returns cards from a 52-card deck.

The deck will be in order of suits (Spades, Hearts, Diamonds and Clubs) and card values (from 2 (lowest) to Ace (highest)).

We are going to use lazy loading - i.e. we are not going to pre-build our card deck.

We just need to recognize that each suit contains `13` cards, so an integer division of the index of the card in the deck will tell us which suit it is. But of course we start indexing at 0.

**Example**

If the requested card is the `6`th in the deck (i.e. index = `5`):

`5 // 13 = 0` ==> first suit (Spades)

If the requested card is the `13`th in the deck (i.e. index = `12`):

`12 // 13 = 0` ==> first suit (Spades)

If the requested card is the `14`th in the deck (i.e. index = `13`):

`13 // 13 = 1` ==> second suit (Hearts)

To determine which card in the suit we are interested in, we simply need to use the `%` operator, again recognizing that there are `13` cards in each suit:

**Example**

If the requested card is the `6`th in the deck (i.e. index = `5`):

`5 % 13 = 5` ==> `5`th card in the suit

If the requested card is the `13`th in the deck (i.e. index = `12`):

`12 % 13 = 12` ==> `12`th card in the suit

If the requested card is the `14`th in the deck (i.e. index = `13`):

`13 % 13 = 0` ==> `1`st card in the suit

In [None]:
_SUITS = ('Spades', 'Hearts', 'Diamonds', 'Clubs')
_RANKS = tuple(range(2, 11) ) + tuple('JQKA')
from collections import namedtuple

Card = namedtuple('Card', 'rank suit')

class CardDeck:
    def __init__(self):
        self.length = len(_SUITS) * len(_RANKS)

    def __len__(self):
        return self.length
    
    def __iter__(self):
        return self.CardDeckIterator(self.length)
        
    class CardDeckIterator:
        def __init__(self, length):
            self.length = length
            self.i = 0
            
        def __iter__(self):
            return self
        
        def __next__(self):
            if self.i >= self.length:
                raise StopIteration
            else:
                suit = _SUITS[self.i // len(_RANKS)]
                rank = _RANKS[self.i % len(_RANKS)]
                self.i += 1
                return Card(rank, suit)

We can now iterate over a deck of cards as follows:

In [None]:
deck = CardDeck()

In [None]:
for card in deck:
    print(card)

Card(rank=2, suit='Spades')
Card(rank=3, suit='Spades')
Card(rank=4, suit='Spades')
Card(rank=5, suit='Spades')
Card(rank=6, suit='Spades')
Card(rank=7, suit='Spades')
Card(rank=8, suit='Spades')
Card(rank=9, suit='Spades')
Card(rank=10, suit='Spades')
Card(rank='J', suit='Spades')
Card(rank='Q', suit='Spades')
Card(rank='K', suit='Spades')
Card(rank='A', suit='Spades')
Card(rank=2, suit='Hearts')
Card(rank=3, suit='Hearts')
Card(rank=4, suit='Hearts')
Card(rank=5, suit='Hearts')
Card(rank=6, suit='Hearts')
Card(rank=7, suit='Hearts')
Card(rank=8, suit='Hearts')
Card(rank=9, suit='Hearts')
Card(rank=10, suit='Hearts')
Card(rank='J', suit='Hearts')
Card(rank='Q', suit='Hearts')
Card(rank='K', suit='Hearts')
Card(rank='A', suit='Hearts')
Card(rank=2, suit='Diamonds')
Card(rank=3, suit='Diamonds')
Card(rank=4, suit='Diamonds')
Card(rank=5, suit='Diamonds')
Card(rank=6, suit='Diamonds')
Card(rank=7, suit='Diamonds')
Card(rank=8, suit='Diamonds')
Card(rank=9, suit='Diamonds')
Card(rank=10, 

Now that we have our deck, how would we obtain the last `7` cards in reverse order from the deck?

One option is to generate a list of all the cards in the deck, then use a slice.

What about iterating in reverse? Using the same technique we generate a list that contains all the cards, reverse the list, and then iterate over the reversed list.

In [None]:
deck = list(CardDeck())

In [None]:
deck[:-8:-1]

[Card(rank='A', suit='Clubs'),
 Card(rank='K', suit='Clubs'),
 Card(rank='Q', suit='Clubs'),
 Card(rank='J', suit='Clubs'),
 Card(rank=10, suit='Clubs'),
 Card(rank=9, suit='Clubs'),
 Card(rank=8, suit='Clubs')]

And to iterate backwards:

In [None]:
deck = list(CardDeck())
deck = deck[::-1]
for card in deck:
    print(card)

Card(rank='A', suit='Clubs')
Card(rank='K', suit='Clubs')
Card(rank='Q', suit='Clubs')
Card(rank='J', suit='Clubs')
Card(rank=10, suit='Clubs')
Card(rank=9, suit='Clubs')
Card(rank=8, suit='Clubs')
Card(rank=7, suit='Clubs')
Card(rank=6, suit='Clubs')
Card(rank=5, suit='Clubs')
Card(rank=4, suit='Clubs')
Card(rank=3, suit='Clubs')
Card(rank=2, suit='Clubs')
Card(rank='A', suit='Diamonds')
Card(rank='K', suit='Diamonds')
Card(rank='Q', suit='Diamonds')
Card(rank='J', suit='Diamonds')
Card(rank=10, suit='Diamonds')
Card(rank=9, suit='Diamonds')
Card(rank=8, suit='Diamonds')
Card(rank=7, suit='Diamonds')
Card(rank=6, suit='Diamonds')
Card(rank=5, suit='Diamonds')
Card(rank=4, suit='Diamonds')
Card(rank=3, suit='Diamonds')
Card(rank=2, suit='Diamonds')
Card(rank='A', suit='Hearts')
Card(rank='K', suit='Hearts')
Card(rank='Q', suit='Hearts')
Card(rank='J', suit='Hearts')
Card(rank=10, suit='Hearts')
Card(rank=9, suit='Hearts')
Card(rank=8, suit='Hearts')
Card(rank=7, suit='Hearts')
Card(ran

This is kind of inefficient since we had to generate the entire list of cards, to then reverse it, and then only pick the first 7 cards from that reversed list.

Maybe we can try Python's built-in `reversed` function instead:

In [None]:
deck = CardDeck()

In [None]:
deck = reversed(deck)

TypeError: 'CardDeck' object is not reversible

As we can see, Python's `reversed` function will not work with out iterator. (It would work automatically with a sequence type, but in this case we don't have a sequence type)

What to do?

We need to somehow define a "reverse" iteration option for our iterator!

We do so by defining the __reversed__ special method in our iterable and instructing out iterator to return elements in reverse order.

If the `__reversed__` method is in our iterable, Python will use that to get the iterator when we call the `reverse()` function:

Let's try that out:

In [None]:
_SUITS = ('Spades', 'Hearts', 'Diamonds', 'Clubs')
_RANKS = tuple(range(2, 11) ) + ('J', 'Q', 'K', 'A')
from collections import namedtuple

Card = namedtuple('Card', 'rank suit')

class CardDeck:
    def __init__(self):
        self.length = len(_SUITS) * len(_RANKS)

    def __len__(self):
        return self.length
    
    def __iter__(self):
        return self.CardDeckIterator(self.length)
        
    def __reversed__(self):
        return self.CardDeckIterator(self.length, reverse=True)
    
    class CardDeckIterator:
        def __init__(self, length, *, reverse=False):
            self.length = length
            self.reverse = reverse
            self.i = 0
            
        def __iter__(self):
            return self
        
        def __next__(self):
            if self.i >= self.length:
                raise StopIteration
            else:
                if self.reverse:
                    index = self.length -1 - self.i
                else:
                    index = self.i
                suit = _SUITS[index // len(_RANKS)]
                rank = _RANKS[index % len(_RANKS)]
                self.i += 1
                return Card(rank, suit)
            


In [None]:
deck = CardDeck()

In [None]:
for card in deck:
    print(card)

Card(rank=2, suit='Spades')
Card(rank=3, suit='Spades')
Card(rank=4, suit='Spades')
Card(rank=5, suit='Spades')
Card(rank=6, suit='Spades')
Card(rank=7, suit='Spades')
Card(rank=8, suit='Spades')
Card(rank=9, suit='Spades')
Card(rank=10, suit='Spades')
Card(rank='J', suit='Spades')
Card(rank='Q', suit='Spades')
Card(rank='K', suit='Spades')
Card(rank='A', suit='Spades')
Card(rank=2, suit='Hearts')
Card(rank=3, suit='Hearts')
Card(rank=4, suit='Hearts')
Card(rank=5, suit='Hearts')
Card(rank=6, suit='Hearts')
Card(rank=7, suit='Hearts')
Card(rank=8, suit='Hearts')
Card(rank=9, suit='Hearts')
Card(rank=10, suit='Hearts')
Card(rank='J', suit='Hearts')
Card(rank='Q', suit='Hearts')
Card(rank='K', suit='Hearts')
Card(rank='A', suit='Hearts')
Card(rank=2, suit='Diamonds')
Card(rank=3, suit='Diamonds')
Card(rank=4, suit='Diamonds')
Card(rank=5, suit='Diamonds')
Card(rank=6, suit='Diamonds')
Card(rank=7, suit='Diamonds')
Card(rank=8, suit='Diamonds')
Card(rank=9, suit='Diamonds')
Card(rank=10, 

In [None]:
deck = reversed(CardDeck())
for card in deck:
    print(card)

Card(rank='A', suit='Clubs')
Card(rank='K', suit='Clubs')
Card(rank='Q', suit='Clubs')
Card(rank='J', suit='Clubs')
Card(rank=10, suit='Clubs')
Card(rank=9, suit='Clubs')
Card(rank=8, suit='Clubs')
Card(rank=7, suit='Clubs')
Card(rank=6, suit='Clubs')
Card(rank=5, suit='Clubs')
Card(rank=4, suit='Clubs')
Card(rank=3, suit='Clubs')
Card(rank=2, suit='Clubs')
Card(rank='A', suit='Diamonds')
Card(rank='K', suit='Diamonds')
Card(rank='Q', suit='Diamonds')
Card(rank='J', suit='Diamonds')
Card(rank=10, suit='Diamonds')
Card(rank=9, suit='Diamonds')
Card(rank=8, suit='Diamonds')
Card(rank=7, suit='Diamonds')
Card(rank=6, suit='Diamonds')
Card(rank=5, suit='Diamonds')
Card(rank=4, suit='Diamonds')
Card(rank=3, suit='Diamonds')
Card(rank=2, suit='Diamonds')
Card(rank='A', suit='Hearts')
Card(rank='K', suit='Hearts')
Card(rank='Q', suit='Hearts')
Card(rank='J', suit='Hearts')
Card(rank=10, suit='Hearts')
Card(rank=9, suit='Hearts')
Card(rank=8, suit='Hearts')
Card(rank=7, suit='Hearts')
Card(ran

#### Reversing Sequences

I just want to point out that if we have a custom **sequence** type we don't need to worry about this.

Let's see a quick example:

In [None]:
class Squares:
    def __init__(self, length):
        self.squares = [i **2 for i in range(length)]
        
    def __len__(self):
        return len(self.squares)
    
    def __getitem__(self, s):
        return self.squares[s]

In [None]:
sq = Squares(10)

In [None]:
for num in Squares(5):
    print(num)

0
1
4
9
16


In [None]:
for num in reversed(Squares(5)):
    print(num)

16
9
4
1
0


As you can see Python was able to automatically reverse the sequence for us.

Also worth noting is that the `__len__` method **must** be implemented for `reversed()` to work:

In [None]:
class Squares:
    def __init__(self, length):
        self.squares = [i **2 for i in range(length)]
        
#     def __len__(self):
#         return len(self.squares)
    
    def __getitem__(self, s):
        return self.squares[s]

In [None]:
for num in reversed(Squares(5)):
    print(num)

TypeError: object of type 'Squares' has no len()

In addition, we can override what is returned when the `reversed()` function is called on our custom sequence type. Here, I'll return a the list of the integers themselves instead of squares just to make this really stand out:

In [None]:
class Squares:
    def __init__(self, length):
        self.length = length
        self.squares = [i **2 for i in range(length)]
        
    def __len__(self):
        return len(self.squares)
    
    def __getitem__(self, s):
        return self.squares[s]
    
    def __reversed__(self):
        print('__reversed__ called')
        return [i for i in range(self.length-1, -1, -1)]

In [None]:
for num in Squares(5):
    print(num)

0
1
4
9
16


In [None]:
for num in reversed(Squares(5)):
    print(num)

__reversed__ called
4
3
2
1
0


### Caveat of Using Iterators as Function Arguments

When a function requires an iterable for one of its arguments, it will also work with any iterator (since iterators are themselves iterables).

But things can go wrong if you do that!

Let's say we have an iterator that returns a collection of random numbers, and we want, for each such collection, find the minimum amd maximum value:

In [None]:
import random

In [None]:
class Randoms:
    def __init__(self, n):
        self.n = n
        self.i = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.i >= self.n:
            raise StopIteration
        else:
            self.i += 1
            return random.randint(0, 100)

In [None]:
random.seed(0)
l = list(Randoms(10))
print(l)

[49, 97, 53, 5, 33, 65, 62, 51, 100, 38]


Now we can easily find the min and max values:

In [None]:
min(l), max(l)

(5, 100)

But watch what happens if we do this:

In [None]:
random.seed(0)
l = Randoms(10)

In [None]:
min(l)

5

In [None]:
max(l)

ValueError: max() arg is an empty sequence

That's because when `min` ran, it iterated over the **iterator** `Randoms(10)`. When we called `max` on the same iterator, it had already been exhausted - i.e. the argument to max was now empty!

So, be really careful when using iterators!

Here's another more practical example.

Let's go back to our `cars.csv` data file and write some code that will return the car names and MPG - except we also want to return a value indicating the percentage of the car's MPG to the least fuel efficient car in the list.

To do so we will need to iterate over the file twice - once to figure out the largest MPG value, and another time to make the calculation MPG/min_mpg * 100.

Let's just quickly see what our file looks like:

In [None]:
f = open('cars.csv')
for row in f:
    print(row, end='')
f.close()    

Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
STRING;DOUBLE;INT;DOUBLE;DOUBLE;DOUBLE;DOUBLE;INT;CAT
Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US
Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US
Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US
AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US
Ford Torino;17.0;8;302.0;140.0;3449.;10.5;70;US
Ford Galaxie 500;15.0;8;429.0;198.0;4341.;10.0;70;US
Chevrolet Impala;14.0;8;454.0;220.0;4354.;9.0;70;US
Plymouth Fury iii;14.0;8;440.0;215.0;4312.;8.5;70;US
Pontiac Catalina;14.0;8;455.0;225.0;4425.;10.0;70;US
AMC Ambassador DPL;15.0;8;390.0;190.0;3850.;8.5;70;US
Citroen DS-21 Pallas;0;4;133.0;115.0;3090.;17.5;70;Europe
Chevrolet Chevelle Concours (sw);0;8;350.0;165.0;4142.;11.5;70;US
Ford Torino (sw);0;8;351.0;153.0;4034.;11.0;70;US
Plymouth Satellite (sw);0;8;383.0;175.0;4166.;10.5;70;US
AMC Rebel SST (sw);0;8;360.0;175.0;3850.;11.0;70;US
Dodge Challenger SE;15.0;8;383.0;170.0;3563.;10.0;70;U

In [None]:
def parse_data_row(row):
    row = row.strip('\n').split(';')
    return row[0], float(row[1])

def max_mpg(data):
    # get an iterator for data (which should be an iterable of some kind)
    max_mpg = 0
    for row in data:
        _, mpg = parse_data_row(row)
        if mpg > max_mpg:
            max_mpg = mpg
    return max_mpg

In [None]:
f = open('cars.csv')
next(f)
next(f)
print(max_mpg(f))
f.close()

46.6


In [None]:
def list_data(data, mpg_max):
    for row in data:
        car, mpg = parse_data_row(row)
        mpg_perc = mpg / mpg_max * 100
        print(f'{car}: {mpg_perc:.2f}%')

In [None]:
f = open('cars.csv')
next(f), next(f)
list_data(f, 46.6)
f.close()

Chevrolet Chevelle Malibu: 38.63%
Buick Skylark 320: 32.19%
Plymouth Satellite: 38.63%
AMC Rebel SST: 34.33%
Ford Torino: 36.48%
Ford Galaxie 500: 32.19%
Chevrolet Impala: 30.04%
Plymouth Fury iii: 30.04%
Pontiac Catalina: 30.04%
AMC Ambassador DPL: 32.19%
Citroen DS-21 Pallas: 0.00%
Chevrolet Chevelle Concours (sw): 0.00%
Ford Torino (sw): 0.00%
Plymouth Satellite (sw): 0.00%
AMC Rebel SST (sw): 0.00%
Dodge Challenger SE: 32.19%
Plymouth 'Cuda 340: 30.04%
Ford Mustang Boss 302: 0.00%
Chevrolet Monte Carlo: 32.19%
Buick Estate Wagon (sw): 30.04%
Toyota Corolla Mark ii: 51.50%
Plymouth Duster: 47.21%
AMC Hornet: 38.63%
Ford Maverick: 45.06%
Datsun PL510: 57.94%
Volkswagen 1131 Deluxe Sedan: 55.79%
Peugeot 504: 53.65%
Audi 100 LS: 51.50%
Saab 99e: 53.65%
BMW 2002: 55.79%
AMC Gremlin: 45.06%
Ford F250: 21.46%
Chevy C20: 21.46%
Dodge D200: 23.61%
Hi 1200D: 19.31%
Datsun PL510: 57.94%
Chevrolet Vega 2300: 60.09%
Toyota Corolla: 53.65%
Ford Pinto: 53.65%
Volkswagen Super Beetle 117: 0.00%
AM

Now let's try and put these together:

In [None]:
with open('cars.csv') as f:
    next(f)
    next(f)
    max_ = max_mpg(f)
    print(f'max={max_}')
    list_data(f, max_)

max=46.6


No output from `list_data`!!

That's because when we called `list_data` we had already exhausted the data file in the call to `max_mpg`.

Our only option is to either create the iterator twice:

In [None]:
with open('cars.csv') as f:
    next(f), next(f)
    max_ = max_mpg(f)
    
with open('cars.csv') as f:
    next(f), next(f)
    list_data(f, max_)

Chevrolet Chevelle Malibu: 38.63%
Buick Skylark 320: 32.19%
Plymouth Satellite: 38.63%
AMC Rebel SST: 34.33%
Ford Torino: 36.48%
Ford Galaxie 500: 32.19%
Chevrolet Impala: 30.04%
Plymouth Fury iii: 30.04%
Pontiac Catalina: 30.04%
AMC Ambassador DPL: 32.19%
Citroen DS-21 Pallas: 0.00%
Chevrolet Chevelle Concours (sw): 0.00%
Ford Torino (sw): 0.00%
Plymouth Satellite (sw): 0.00%
AMC Rebel SST (sw): 0.00%
Dodge Challenger SE: 32.19%
Plymouth 'Cuda 340: 30.04%
Ford Mustang Boss 302: 0.00%
Chevrolet Monte Carlo: 32.19%
Buick Estate Wagon (sw): 30.04%
Toyota Corolla Mark ii: 51.50%
Plymouth Duster: 47.21%
AMC Hornet: 38.63%
Ford Maverick: 45.06%
Datsun PL510: 57.94%
Volkswagen 1131 Deluxe Sedan: 55.79%
Peugeot 504: 53.65%
Audi 100 LS: 51.50%
Saab 99e: 53.65%
BMW 2002: 55.79%
AMC Gremlin: 45.06%
Ford F250: 21.46%
Chevy C20: 21.46%
Dodge D200: 23.61%
Hi 1200D: 19.31%
Datsun PL510: 57.94%
Chevrolet Vega 2300: 60.09%
Toyota Corolla: 53.65%
Ford Pinto: 53.65%
Volkswagen Super Beetle 117: 0.00%
AM

or we could read the entire data set into a list first - but of course if the file is huge we will have some potential for running out memory:

In [None]:
with open('cars.csv') as f:
    data = [row for row in f][2:]

or, more simply:

In [None]:
with open('cars.csv') as f:
    data = f.readlines()[2:]

In [None]:
max_ = max_mpg(data)
list_data(data, max_)

Chevrolet Chevelle Malibu: 38.63%
Buick Skylark 320: 32.19%
Plymouth Satellite: 38.63%
AMC Rebel SST: 34.33%
Ford Torino: 36.48%
Ford Galaxie 500: 32.19%
Chevrolet Impala: 30.04%
Plymouth Fury iii: 30.04%
Pontiac Catalina: 30.04%
AMC Ambassador DPL: 32.19%
Citroen DS-21 Pallas: 0.00%
Chevrolet Chevelle Concours (sw): 0.00%
Ford Torino (sw): 0.00%
Plymouth Satellite (sw): 0.00%
AMC Rebel SST (sw): 0.00%
Dodge Challenger SE: 32.19%
Plymouth 'Cuda 340: 30.04%
Ford Mustang Boss 302: 0.00%
Chevrolet Monte Carlo: 32.19%
Buick Estate Wagon (sw): 30.04%
Toyota Corolla Mark ii: 51.50%
Plymouth Duster: 47.21%
AMC Hornet: 38.63%
Ford Maverick: 45.06%
Datsun PL510: 57.94%
Volkswagen 1131 Deluxe Sedan: 55.79%
Peugeot 504: 53.65%
Audi 100 LS: 51.50%
Saab 99e: 53.65%
BMW 2002: 55.79%
AMC Gremlin: 45.06%
Ford F250: 21.46%
Chevy C20: 21.46%
Dodge D200: 23.61%
Hi 1200D: 19.31%
Datsun PL510: 57.94%
Chevrolet Vega 2300: 60.09%
Toyota Corolla: 53.65%
Ford Pinto: 53.65%
Volkswagen Super Beetle 117: 0.00%
AM

We may even write functions that need to iterate more than once over an iterable. For example:

In [None]:
def list_data(data):
    max_mpg = 0
    for row in data:
        _, mpg = parse_data_row(row)
        if mpg > max_mpg:
            max_mpg = mpg
    
    for row in data:
        car, mpg = parse_data_row(row)
        mpg_perc = mpg / max_mpg * 100
        print(f'{car}: {mpg_perc:.2f}%')

But this will not work if we pass an iterator as the argument:

with open('cars.csv') as f:
    next(f)
    next(f)
    list_data(f)

We might want to be more defensive about this in our function, either by raising an exception if the argument is an iterator, or making an iterable from the iterator:

In [None]:
def list_data(data):
    if iter(data) is data:
        raise ValueError('data cannot be an iterator.')
    max_mpg = 0
    for row in data:
        _, mpg = parse_data_row(row)
        if mpg > max_mpg:
            max_mpg = mpg
    
    for row in data:
        car, mpg = parse_data_row(row)
        mpg_perc = mpg / max_mpg * 100
        print(f'{car}: {mpg_perc:.2f}%')

In [None]:
with open('cars.csv') as f:
    next(f)
    next(f)
    list_data(f)

ValueError: data cannot be an iterator.

or this way:

In [None]:
def list_data(data):
    if iter(data) is data:
        data = list(data)
    
    max_mpg = 0
    for row in data:
        _, mpg = parse_data_row(row)
        if mpg > max_mpg:
            max_mpg = mpg
    
    for row in data:
        car, mpg = parse_data_row(row)
        mpg_perc = mpg / max_mpg * 100
        print(f'{car}: {mpg_perc:.2f}%')

In [None]:
with open('cars.csv') as f:
    next(f)
    next(f)
    list_data(f)

Chevrolet Chevelle Malibu: 38.63%
Buick Skylark 320: 32.19%
Plymouth Satellite: 38.63%
AMC Rebel SST: 34.33%
Ford Torino: 36.48%
Ford Galaxie 500: 32.19%
Chevrolet Impala: 30.04%
Plymouth Fury iii: 30.04%
Pontiac Catalina: 30.04%
AMC Ambassador DPL: 32.19%
Citroen DS-21 Pallas: 0.00%
Chevrolet Chevelle Concours (sw): 0.00%
Ford Torino (sw): 0.00%
Plymouth Satellite (sw): 0.00%
AMC Rebel SST (sw): 0.00%
Dodge Challenger SE: 32.19%
Plymouth 'Cuda 340: 30.04%
Ford Mustang Boss 302: 0.00%
Chevrolet Monte Carlo: 32.19%
Buick Estate Wagon (sw): 30.04%
Toyota Corolla Mark ii: 51.50%
Plymouth Duster: 47.21%
AMC Hornet: 38.63%
Ford Maverick: 45.06%
Datsun PL510: 57.94%
Volkswagen 1131 Deluxe Sedan: 55.79%
Peugeot 504: 53.65%
Audi 100 LS: 51.50%
Saab 99e: 53.65%
BMW 2002: 55.79%
AMC Gremlin: 45.06%
Ford F250: 21.46%
Chevy C20: 21.46%
Dodge D200: 23.61%
Hi 1200D: 19.31%
Datsun PL510: 57.94%
Chevrolet Vega 2300: 60.09%
Toyota Corolla: 53.65%
Ford Pinto: 53.65%
Volkswagen Super Beetle 117: 0.00%
AM

### Project: Description

The starting point for this project is the `Polygon` class and the `Polygons` sequence type we created in the previous project.

The code for these classes along with the unit tests for the `Polygon` class are below if you want to use those as your starting point. But use whatever you came up with in the last project.

We have two goals:

##### Goal 1

Refactor the `Polygon` class so that all the calculated properties are lazy properties, i.e. they should still be calculated properties, but they should not have to get recalculated more than once (since we made our `Polygon` class "immutable").

##### Goal 2

Refactor the `Polygons` (sequence) type, into an **iterable**. Make sure also that the elements in the iterator are computed lazily - i.e. you can no longer use a list as an underlying storage mechanism for your polygons.

You'll need to implement both an iterable, and an iterator.

##### Code from Previous Project

In [None]:
import math

class Polygon:
    def __init__(self, n, R):
        if n < 3:
            raise ValueError('Polygon must have at least 3 vertices.')
        self._n = n
        self._R = R
        
    def __repr__(self):
        return f'Polygon(n={self._n}, R={self._R})'
    
    @property
    def count_vertices(self):
        return self._n
    
    @property
    def count_edges(self):
        return self._n
    
    @property
    def circumradius(self):
        return self._R
    
    @property
    def interior_angle(self):
        return (self._n - 2) * 180 / self._n

    @property
    def side_length(self):
        return 2 * self._R * math.sin(math.pi / self._n)
    
    @property
    def apothem(self):
        return self._R * math.cos(math.pi / self._n)
    
    @property
    def area(self):
        return self._n / 2 * self.side_length * self.apothem
    
    @property
    def perimeter(self):
        return self._n * self.side_length
    
    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return (self.count_edges == other.count_edges 
                    and self.circumradius == other.circumradius)
        else:
            return NotImplemented
        
    def __gt__(self, other):
        if isinstance(other, self.__class__):
            return self.count_vertices > other.count_vertices
        else:
            return NotImplemented

In [None]:
def test_polygon():
    abs_tol = 0.001
    rel_tol = 0.001
    
    try:
        p = Polygon(2, 10)
        assert False, ('Creating a Polygon with 2 sides: '
                       ' Exception expected, not received')
    except ValueError:
        pass
                       
    n = 3
    R = 1
    p = Polygon(n, R)
    assert str(p) == 'Polygon(n=3, R=1)', f'actual: {str(p)}'
    assert p.count_vertices == n, (f'actual: {p.count_vertices},'
                                   f' expected: {n}')
    assert p.count_edges == n, f'actual: {p.count_edges}, expected: {n}'
    assert p.circumradius == R, f'actual: {p.circumradius}, expected: {n}'
    assert p.interior_angle == 60, (f'actual: {p.interior_angle},'
                                    ' expected: 60')
    n = 4
    R = 1
    p = Polygon(n, R)
    assert p.interior_angle == 90, (f'actual: {p.interior_angle}, '
                                    ' expected: 90')
    assert math.isclose(p.area, 2, 
                        rel_tol=abs_tol, 
                        abs_tol=abs_tol), (f'actual: {p.area},'
                                           ' expected: 2.0')
    
    assert math.isclose(p.side_length, math.sqrt(2),
                       rel_tol=rel_tol,
                       abs_tol=abs_tol), (f'actual: {p.side_length},'
                                          f' expected: {math.sqrt(2)}')
    
    assert math.isclose(p.perimeter, 4 * math.sqrt(2),
                       rel_tol=rel_tol,
                       abs_tol=abs_tol), (f'actual: {p.perimeter},'
                                          f' expected: {4 * math.sqrt(2)}')
    
    assert math.isclose(p.apothem, 0.707,
                       rel_tol=rel_tol,
                       abs_tol=abs_tol), (f'actual: {p.perimeter},'
                                          ' expected: 0.707')
    p = Polygon(6, 2)
    assert math.isclose(p.side_length, 2,
                        rel_tol=rel_tol, abs_tol=abs_tol)
    assert math.isclose(p.apothem, 1.73205,
                        rel_tol=rel_tol, abs_tol=abs_tol)
    assert math.isclose(p.area, 10.3923,
                        rel_tol=rel_tol, abs_tol=abs_tol)
    assert math.isclose(p.perimeter, 12,
                        rel_tol=rel_tol, abs_tol=abs_tol)
    assert math.isclose(p.interior_angle, 120,
                        rel_tol=rel_tol, abs_tol=abs_tol)
    
    p = Polygon(12, 3)
    assert math.isclose(p.side_length, 1.55291,
                        rel_tol=rel_tol, abs_tol=abs_tol)
    assert math.isclose(p.apothem, 2.89778,
                        rel_tol=rel_tol, abs_tol=abs_tol)
    assert math.isclose(p.area, 27,
                        rel_tol=rel_tol, abs_tol=abs_tol)
    assert math.isclose(p.perimeter, 18.635,
                        rel_tol=rel_tol, abs_tol=abs_tol)
    assert math.isclose(p.interior_angle, 150,
                        rel_tol=rel_tol, abs_tol=abs_tol)
    
    p1 = Polygon(3, 10)
    p2 = Polygon(10, 10)
    p3 = Polygon(15, 10)
    p4 = Polygon(15, 100)
    p5 = Polygon(15, 100)
    
    assert p2 > p1
    assert p2 < p3
    assert p3 != p4
    assert p1 != p4
    assert p4 == p5

In [None]:
class Polygons:
    def __init__(self, m, R):
        if m < 3:
            raise ValueError('m must be greater than 3')
        self._m = m
        self._R = R
        self._polygons = [Polygon(i, R) for i in range(3, m+1)]
        
    def __len__(self):
        return self._m - 2
    
    def __repr__(self):
        return f'Polygons(m={self._m}, R={self._R})'
    
    def __getitem__(self, s):
        return self._polygons[s]
    
    @property
    def max_efficiency_polygon(self):
        sorted_polygons = sorted(self._polygons, 
                                 key=lambda p: p.area/p.perimeter,
                                reverse=True)
        return sorted_polygons[0]