### List Comprehensions

We've used list comprehensions throughout this course quite a bit, so the concept should not be new, but let's recap quickly what we have seen so far with list comprehensions.

A list comprehension is language construct that allows to easily build a list by transforming, and optionally, filtering, another iterable.

For example, using a more traditional Java style approach we might create a list of squares of the first 100 positive integers in this way:

In [None]:
squares = []  # create an empty list
for i in range(1, 101):
    squares.append(i**2)

In [None]:
squares[0:10]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [None]:
squares = [i**2 for i in range(1, 101)]

In [None]:
squares[0:10]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [None]:
squares = []
for i in range(1, 101):
    if i % 2 == 0:
        squares.append(i**2)

In [None]:
squares[0:10]

[4, 16, 36, 64, 100, 144, 196, 256, 324, 400]

In [None]:
squares = [i**2 for i in range(1, 101) if i % 2 == 0]
squares[0:10]

[4, 16, 36, 64, 100, 144, 196, 256, 324, 400]

In [None]:
squares = [i**2
          for i in range(1, 101)
          if i % 2 == 0]
squares[0:10]

[4, 16, 36, 64, 100, 144, 196, 256, 324, 400]

We need to recognize that list comprehensions are essentially temporary functions that Python creates, executes and returns the resulting list from it.

We can see this by compiling a comprehension, and then disassembling the compiled code to see what happened:

In [None]:
import dis

In [None]:
compiled_code = compile('[i**2 for i in (1, 2, 3)]', 
                        filename='', mode='eval')

In [None]:
dis.dis(compiled_code)

  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f6435588030, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_CONST               2 ((1, 2, 3))
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f6435588030, file "", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                12 (to 18)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LOAD_CONST               0 (2)
             12 BINARY_POWER
             14 LIST_APPEND              2
             16 JUMP_ABSOLUTE            4
        >>   18 RETURN_VALUE


As you can see, in step 4, Python created a function (`MAKE_FUNCTION`), called it (`CALL_FUNCTION`), and then returned the result (`RETURN_VALUE`) in the last step.

So, comprehensions will behave like functions in terms of **scope**. They have local scope, and can access global and nonlocal scopes too. And nested comprehensions will also behave like nested functions and closures.

In [None]:
table = []
for i in range(1, 11):
    row = []
    for j in range(1, 11):
        row.append(i*j)
    table.append(row)

In [None]:
table

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
 [3, 6, 9, 12, 15, 18, 21, 24, 27, 30],
 [4, 8, 12, 16, 20, 24, 28, 32, 36, 40],
 [5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
 [6, 12, 18, 24, 30, 36, 42, 48, 54, 60],
 [7, 14, 21, 28, 35, 42, 49, 56, 63, 70],
 [8, 16, 24, 32, 40, 48, 56, 64, 72, 80],
 [9, 18, 27, 36, 45, 54, 63, 72, 81, 90],
 [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]]

In [None]:
table2 = [ [i * j for j in range(1, 11)] 
          for i in range(1, 11)]
table2

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
 [3, 6, 9, 12, 15, 18, 21, 24, 27, 30],
 [4, 8, 12, 16, 20, 24, 28, 32, 36, 40],
 [5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
 [6, 12, 18, 24, 30, 36, 42, 48, 54, 60],
 [7, 14, 21, 28, 35, 42, 49, 56, 63, 70],
 [8, 16, 24, 32, 40, 48, 56, 64, 72, 80],
 [9, 18, 27, 36, 45, 54, 63, 72, 81, 90],
 [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]]

You'll notice here that we nested one list comprehension inside another.

You should also notice that the inner comprehension (the one that has `i*j`) is accessing a local variable `i`, as well as a variable from the enclosing comprehension - the `j` variable. Just like a closure! And in fact, it is exactly that. We'll come back to that in a bit.

Let's do another example - we'll construct Pascal's triangle - which is basically just a triangle of binomial coefficients:

```
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
```

we just need to know how to calculate combinations:
```
C(n, k) = n! / (k! (n-k)!)
```

* row 0, column 0: n=0, k=0: c(0, 0) = 0! / 0! 0! = 1/1 = 1
* row 4, column 2: n=4, k=2: c(4, 2) = 4! / 2! 2! = 4x3x2 / 2x2 = 6

In other words, we need to calculate the following list of lists:
```
c(0,0)
c(1,0) c(1,1)
c(2,0) c(2,1) c(2,3)
c(3,0) c(3,1) c(3,2) c(3,3)
...
```

In [None]:
from math import factorial

def combo(n, k):
    return factorial(n) // (factorial(k) * factorial(n-k))

size = 10  # global variable
pascal = [ [combo(n, k) for k in range(n+1)] for n in range(size+1) ]
pascal

[[1],
 [1, 1],
 [1, 2, 1],
 [1, 3, 3, 1],
 [1, 4, 6, 4, 1],
 [1, 5, 10, 10, 5, 1],
 [1, 6, 15, 20, 15, 6, 1],
 [1, 7, 21, 35, 35, 21, 7, 1],
 [1, 8, 28, 56, 70, 56, 28, 8, 1],
 [1, 9, 36, 84, 126, 126, 84, 36, 9, 1],
 [1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1]]

Again note how the outer comprehension accessed a global variable (`size`), created a local variable (`n`), and the inner comprehension created its own local variable (`k`) and also accessed the nonlocal variable `n`.

#### Nested Loops

We can also created comprehensions that use nested loops (not nested comprehensions, just nested loops).

Let's start with a simple example.

Suppose we have two lists of characters, and we want to produce a new list consisting of the pairwise concatenated characters.

e.g. 
`l1 = ['a', 'b', 'c']`

`l2 = ['x', 'y', 'z']`

and we want to produce the result:

`['ax', 'ay', 'az', 'bx', 'by', 'bz', 'cx', 'cy', 'cz']`


In [None]:
l1 = ['a', 'b', 'c']
l2 = ['x', 'y', 'z']
result = []
for s1 in l1:
    for s2 in l2:
        result.append(s1+s2)


In [None]:
result

['ax', 'ay', 'az', 'bx', 'by', 'bz', 'cx', 'cy', 'cz']

In [None]:
result = [s1 + s2 for s1 in l1 for s2 in l2]

In [None]:
result

['ax', 'ay', 'az', 'bx', 'by', 'bz', 'cx', 'cy', 'cz']

We could expand this slightly by specifying that pairs resulting in the same letter twice should be ommitted:

In [None]:
l1 = ['a', 'b', 'c']
l2 = ['b', 'c', 'd']

result = []
for s1 in l1:
    for s2 in l2:
        if s1 != s2:
            result.append(s1 + s2)

result

['ab', 'ac', 'ad', 'bc', 'bd', 'cb', 'cd']

In [None]:
result = [s1 + s2 for s1 in l1 for s2 in l2 if s1 != s2]
result

['ab', 'ac', 'ad', 'bc', 'bd', 'cb', 'cd']

In [None]:
l1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
l2 = ['a', 'b', 'c', 'd']
list(zip(l1, l2))

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

In [None]:
result = []
for index_1, item_1 in enumerate(l1):
    for index_2, item_2 in enumerate(l2):
        if index_1 == index_2:
            result.append((item_1, item_2))
result

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

In [None]:
result = [ (item_1, item_2)
         for index_1, item_1 in enumerate(l1)
         for index_2, item_2 in enumerate(l2)
         if index_1 == index_2]

result

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

List comprehensions can also be quite handy when used in conjunction with functions such as `sum` for example.

Suppose we have two n-dimensional vectors, represented as tuple of numbers, and we want to find the dot product of the two vectors:

`
v1 = (c1, c2, c3, ..., cn)
v2 = (d1, d2, d3, ..., dn)
`

Then, the dot product is:

`
c1 * d1 + c2 * d2 + ... + cn * dn
`

In [None]:
v1 = (1, 2, 3, 4, 5, 6)
v2 = (10, 20, 30, 40, 50, 60)

In [None]:
dot = 0
for i in range(len(v1)):
    dot += (v1[i] * v2[i])
print(dot)

910


In [None]:
dot = sum([i * j for i, j in zip(v1, v2)])
print(dot)

910


In [None]:
dot = sum(i * j for i, j in zip(v1, v2))
print(dot)

910


In [None]:
if 'number' in globals():
    del number

In [None]:
l = [number**2 for number in range(5)]
print(l)

[0, 1, 4, 9, 16]


In [None]:
'number' in globals()

False

In [None]:
number = 100

In [None]:
l = [number**2 for number in range(5)]
l

[0, 1, 4, 9, 16]

In [None]:
number

100

As you can see, `number` in the comprehension was still local to the comprehension, and our global `number` was not affected. 

This is similar to global and nonlocal variables in functions.

Because `number` is the loop item, it means that it gets *assigned* a value before being referenced, hence it is considered local - even if that symbol exists in a global or nonlocal scope.

On the other hand, consider this example:


In [None]:
number = 100
l = [number * i for i in range(5)]
print(l)

[0, 100, 200, 300, 400]


As you can see, the scope of the comprehension was able to reach out for `number` in the global scope. Same as functions.

Now let's look at an example we've seen before when we studied closures.

Suppose we want to generate a list of functions that will calculate powers of their argument, i.e. we want to define a bunch of functions

* `fn_1(arg) --> arg ** 1`
* `fn_2(arg) --> arg ** 2`
* `fn_3(arg) --> arg ** 3`
etc...

In [None]:
fn_0 = lambda x: x**0
fn_1 = lambda x: x**1
fn_2 = lambda x: x**2
fn_3 = lambda x: x**3
# etc

In [None]:
funcs = [lambda x: x**0, lambda x: x**1, lambda x: x**2, lambda x: x**3]

In [None]:
print(funcs[0](10))
print(funcs[1](10))
print(funcs[2](10))
print(funcs[3](10))

1
10
100
1000


In [None]:
if 'i' in globals():
    del i

In [None]:
funcs = []
for i in range(6):
    funcs.append(lambda x: x**i)

In [None]:
funcs

[<function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>]

In [None]:
print(funcs[0](10))
print(funcs[1](10))
print(funcs[2](10))
print(funcs[3](10))

100000
100000
100000
100000


In [None]:
print(i)

5


In [None]:
# for (i = 0; i < 100; i++){
#     std:cout<<i
# }
# std:cout<<i

for am_i_global in range(10):
    pass

'am_i_global' in globals()

True

In [None]:
globals()

{'__name__': '__main__',
 '__doc__': 'Automatically created module for IPython interactive environment',
 '__package__': None,
 '__loader__': None,
 '__spec__': None,
 '__builtin__': <module 'builtins' (built-in)>,
 '__builtins__': <module 'builtins' (built-in)>,
 '_ih': ['',
  'squares = []  # create an empty list\nfor i in range(1, 101):\n    squares.append(i**2)',
  'squares[0:10]',
  'squares = [i**2 for i in range(1, 101)]',
  'squares[0:10]',
  'squares = []\nfor i in range(1, 101):\n    if i % 2 == 0:\n        squares.append(i**2)',
  'squares[0:10]',
  'squares = [i**2 for i in range(1, 101) if i % 2 == 0]\nsquares[0:10]',
  'squares = [i**2\n          for i in range(1, 101)\n          if i % 2 == 0]\nsquares[0:10]',
  'import dis',
  "compiled_code = compile('[i**2 for i in (1, 2, 3)]', \n                        filename='', mode='eval')",
  'dis.dis(compiled_code)',
  'table = []\nfor i in range(1, 11):\n    row = []\n    for j in range(1, 11):\n        row.append(i*j)\n    t

You'll also note that it has a value of `5` (from the last iteration that ran).

Now let's walk through what happened manually:

In the first iteration, the symbol `i` was created, and assigned a value of `0`:

In [None]:
# funcs = []
# for i in range(6):
#     funcs.append(lambda x: x**i)

i = 0
def fn_0(x):
    return x ** i

The `i` in `fn_0` is actually the global variable `i`.

For the next 'iteration' we increment `i` by `1`:

In [None]:
i=1
def fn_1(x):
    return x ** i

In [None]:
i = 5

fn_0(10)

100000

Can we somehow fix this problem?

Yes, and it relies on default values and when default values are calculated and stored with the function definition. Recall that default values are evaluated and stored with the function's definition **when the function is being created (i.e. compiled)**. Right now we are running into a problem because the free variable `i` is being evauated inside each function's body at **run time**.

So, we can fix this by making each current value of `i` a paramer default of each lambda - this will get evaluated at the functions creation time - i.e. at each loop iteration:

In [None]:
funcs = [lambda x, pow=i: x**pow for i in range(6)]

funcs[0](10), funcs[1](10), funcs[2](10)

(1, 10, 100)

In [None]:
funcs = [lambda x: x**i for i in range(6)]

funcs[0](10), funcs[1](10), funcs[2](10)

# i = 0
# def fn_0(x):
#     return x ** i

(100000, 100000, 100000)

### Iterating Collections

We saw how sequence types support iteration by being able to access elements by index. We could even write our custom sequence types by implementing the `__getitem__` method.

If we think about iterating over a collection, what we really need is a way to request the **next** item in the collection.

If we can do that, our collection does not require being indexable, nor does it need to be ordered (i.e. we don't need the notion of relative positions of elements in the container).

This is exactly what iterables are in general - they provide a method that returns the "next" element in the collection. This approach works equally well with sequence type collections, as well as unordered collection types such as sets.

Of course, the order in which **next** returns items from an unordered colllection is not known in advance - and we see that when we iterate over a set for example:

In [None]:
s = {'x', 'y', 'b', 'c', 'a'}
for item in s:
    print(item)

x
c
b
a
y


In [None]:
s[0]

TypeError: 'set' object is not subscriptable

In [None]:
class Squares:
    def __init__(self):
        self.i = 0
    
    def next_(self):
        result = self.i ** 2
        self.i += 1
        return result

In [None]:
sq = Squares()

In [None]:
sq.next_()

0

> Student Question

what if in the __init__ func, I wanted to create a new variable that is the 3 times x. ie. self.new_var = 3 * x

Do I write - 
self.new_var = 3 * x or
self.new_var = 3 * self.x

self.x = x
self.new_var = 3 * self.x

In [None]:
sq.next_()

1

In [None]:
sq.next_()

4

In [None]:
for i in range(10):
    print(sq.next_())

9
16
25
36
49
64
81
100
121
144


In [None]:
class Squares:
    def __init__(self, length):
        self.length = length
        self.i = 0
    
    def next_(self):
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result           
        
    def __len__(self):
        return self.length

In [None]:
sq = Squares(3)

In [None]:
len(sq)

3

In [None]:
sq.next_()

0

In [None]:
sq.next_()

1

In [None]:
sq.next_()

4

In [None]:
sq.next_()

StopIteration: 

So now, we can essentially loop over the collection in a very similar way to how we did it with sequences and the `__getitem__` method:

In [None]:
sq = Squares(5)
while True:
    try:
        print(sq.next_())
    except StopIteration:
        # reached end of iteration
        # stop looping
        break       

0
1
4
9
16


In [None]:
sq.next_()

StopIteration: 

In [None]:
for i in Squares(10):
    print(i)

TypeError: 'Squares' object is not iterable

In [None]:
class Squares:
    def __init__(self, length):
        self.length = length
        self.i = 0
    
    def __next__(self):
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result   
    
    def __len__(self):
        return self.length

In [None]:
sq = Squares(3)

In [None]:
next(sq)

0

In [None]:
next(sq), next(sq)

(1, 4)

In [None]:
next(sq)

StopIteration: 

In [None]:
sq = Squares(5)
while True:
    try:
        print(next(sq))
    except StopIteration:
        break  

0
1
4
9
16


In [None]:
for i in Squares(10):
    print(i)

TypeError: 'Squares' object is not iterable

In [None]:
import random

In [None]:
class RandomNumbers:
    def __init__(self, length, *, range_min=0, range_max=10):
        self.length = length
        self.range_min = range_min
        self.range_max = range_max
        self.num_requested = 0
        
    def __len__(self):
        return self.length
    
    def __next__(self):
        if self.num_requested >= self.length:
            raise StopIteration
        else:
            self.num_requested += 1
            return random.randint(self.range_min, self.range_max)

In [None]:
numbers = RandomNumbers(10)

In [None]:
len(numbers)

10

In [None]:
while True:
    try:
        print(next(numbers))
    except StopIteration:
        break

4
0
2
10
9
3
2
6
3
4


In [None]:
numbers = RandomNumbers(10)

for item in numbers:
    print(item)

TypeError: 'RandomNumbers' object is not iterable

### Iterators

This idea of using `__next__` and the `StopIteration` exception is exactly what Python does.

So, somehow we need to tell Python that the object we are dealing with can be used with `next`.

To do so, we create an `iterator` type object.

Iterators are objects that implement:
* a `__next__` method
* an `__iter__` method that simply returns the object itself

In [None]:
# class Squares:
#     def __init__(self, length):
#         self.length = length
#         self.i = 0
    
#     def __next__(self):
#         if self.i >= self.length:
#             raise StopIteration
#         else:
#             result = self.i ** 2
#             self.i += 1
#             return result   
    
#     def __len__(self):
#         return self.length

class Squares:
    def __init__(self, length):
        self.length = length
        self.i = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result

In [None]:
sq = Squares(5)
print(next(sq))
print(next(sq))
print(next(sq))

0
1
4


In [None]:
sq = Squares(5)

for item in sq:
    print(item)

0
1
4
9
16


In [None]:
for item in sq:
    print(item)

In [None]:
sq = Squares(5)
for item in sq:
    print(item)

0
1
4
9
16


In [None]:
sq = Squares(5)

In [None]:
id(sq)

140067582398096

In [None]:
id(sq.__iter__())

140067582398096

In [None]:
id(iter(sq))

140067582398096

In [None]:
sq = Squares(5)

In [None]:
[item for item in sq if item%2==0]

[0, 4, 16]

In [None]:
sq = Squares(5)
list(enumerate(sq))

[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16)]

In [None]:
list(enumerate(sq))

[]

In [None]:
sq = Squares(5)
list(enumerate(sq))

[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16)]

In [None]:
sq = Squares(5)
sorted(sq, reverse=True)

[16, 9, 4, 1, 0]

#### Python Iterators Summary

Iterators are objects that implement the `__iter__` and `__next__` methods.

The `__iter__` method of an iterator just returns itself.

Once we fully iterate over an iterator, the iterator is **exhausted** and we can no longer use it for iteration purposes.

The way Python applies a `for` loop to an iterator object is basically what we saw with the `while` loop and the `StopIteration` exception.

In [None]:
sq = Squares(5)
while True:
    try:
        print(next(sq))
    except StopIteration:
        break

0
1
4
9
16


In [None]:
class Squares:
    def __init__(self, length):
        self.length = length
        self.i = 0
        
    def __iter__(self):
        print('calling __iter__')
        return self
    
    def __next__(self):
        print('calling __next__')
        if self.i >= self.length:
            print('About to raise StopIteration Exception')
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result

In [None]:
sq = Squares(5)

for i in sq:
    print(i)

calling __iter__
calling __next__
0
calling __next__
1
calling __next__
4
calling __next__
9
calling __next__
16
calling __next__
About to raise StopIteration Exception


In [None]:
sq = Squares(5)
[item for item in sq if item%2==0]

calling __iter__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__
About to raise StopIteration Exception


[0, 4, 16]

In [None]:
sq = Squares(5)
list(enumerate(sq))

calling __iter__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__
About to raise StopIteration Exception


[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16)]

In [None]:
sq = Squares(5)
sorted(sq, reverse=True)

calling __iter__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__
calling __next__
About to raise StopIteration Exception


[16, 9, 4, 1, 0]

Why is `__iter__` being called? After all, it just returns itself!

That's the topic of the next lecture!

But let's see how we can mimic what Python is doing:

In [None]:
sq = Squares(5)
sq_iterator = iter(sq)
print(id(sq), id(sq_iterator))
while True:
    try:
        item = next(sq_iterator)
        print(item)
    except StopIteration:
        break

calling __iter__
140067580573712 140067580573712
calling __next__
0
calling __next__
1
calling __next__
4
calling __next__
9
calling __next__
16
calling __next__
About to raise StopIteration Exception


### Iterators and Iterables

Previously we saw that we could create **iterator** objects by simply implementing:

* a `__next__` method that returns the next element in the container
* an `__iter__` method that just returns the object itself (the iterator object)

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['Paris', 'Berlin', 'Rome', 'Madrid', 'London']
        self._index = 0
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._cities):
            raise StopIteration
        else:
            item = self._cities[self._index]
            self._index += 1
            return item

In [None]:
cities = Cities()
list(enumerate(cities))

[(0, 'Paris'), (1, 'Berlin'), (2, 'Rome'), (3, 'Madrid'), (4, 'London')]

In [None]:
cities=Cities()
[item.upper() for item in cities]

['PARIS', 'BERLIN', 'ROME', 'MADRID', 'LONDON']

In [None]:
cities=Cities()
sorted(cities)

['Berlin', 'London', 'Madrid', 'Paris', 'Rome']

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)

In [None]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item

In [None]:
cities = Cities()
iter_1 = CityIterator(cities)


In [None]:
for city in iter_1:
    print(city)

New York
Newark
New Delhi
Newcastle


In [None]:
iter_2 = CityIterator(cities)
[city.upper() for city in iter_2]

['NEW YORK', 'NEWARK', 'NEW DELHI', 'NEWCASTLE']

In [None]:
for city in cities:
    print(city)

TypeError: 'Cities' object is not iterable

In [None]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        print('Calling CityIterator __init__')
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        print('Calling CitiyIterator instance __iter__')
        return self
    
    def __next__(self):
        print('Calling __next__')
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item

In [None]:
iter_1 = CityIterator(cities)

Calling CityIterator __init__


In [None]:
for city in iter_1:
    print(city)

Calling CitiyIterator instance __iter__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__



An **iterable** is an object that:
* implements the `__iter__` method
* and that method returns an **iterator** which can be used to iterate over the object

In [None]:
class CityIterator:
    def __init__(self, city_obj):
        # cities is an instance of Cities
        print('Calling CityIterator __init__')
        self._city_obj = city_obj
        self._index = 0
        
    def __iter__(self):
        print('Calling CitiyIterator instance __iter__')
        return self
    
    def __next__(self):
        print('Calling __next__')
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self):
        print('Calling Cities instance __iter__')
        return CityIterator(self)

In [None]:
cities = Cities()

In [None]:
for city in cities:
    print(city)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


In [None]:
for city in cities:
    print(city)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


In [None]:
list(enumerate(cities))

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
Calling __next__
Calling __next__
Calling __next__
Calling __next__


[(0, 'New York'), (1, 'Newark'), (2, 'New Delhi'), (3, 'Newcastle')]

In [None]:
sorted(cities, reverse=True)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
Calling __next__
Calling __next__
Calling __next__
Calling __next__


['Newcastle', 'Newark', 'New York', 'New Delhi']

In [None]:
del CityIterator

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self):
        print('Calling Cities instance __iter__')
        return self.CityIterator(self)
    
    class CityIterator:
        def __init__(self, city_obj):
            # cities is an instance of Cities
            print('Calling CityIterator __init__')
            self._city_obj = city_obj
            self._index = 0

        def __iter__(self):
            print('Calling CitiyIterator instance __iter__')
            return self

        def __next__(self):
            print('Calling __next__')
            if self._index >= len(self._city_obj):
                raise StopIteration
            else:
                item = self._city_obj._cities[self._index]
                self._index += 1
                return item

In [None]:
cities = Cities()

list(enumerate(cities))

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
Calling __next__
Calling __next__
Calling __next__
Calling __next__


[(0, 'New York'), (1, 'Newark'), (2, 'New Delhi'), (3, 'Newcastle')]

In [None]:
iter_1 = iter(cities)
iter_2 = iter(cities)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling Cities instance __iter__
Calling CityIterator __init__


In [None]:
id(iter_1), id(iter_2)

(140067580045264, 140067579817040)

In [None]:
cities = Cities()

In [None]:
len(cities)

4

In [None]:
cities[1]

TypeError: 'Cities' object is not subscriptable

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['New York', 'Newark', 'New Delhi', 'Newcastle']
        
    def __len__(self):
        return len(self._cities)
    
    def __getitem__(self, s):
        print('getting item...')
        return self._cities[s]
    
    def __iter__(self):
        print('Calling Cities instance __iter__')
        return self.CityIterator(self)
    
    class CityIterator:
        def __init__(self, city_obj):
            # cities is an instance of Cities
            print('Calling CityIterator __init__')
            self._city_obj = city_obj
            self._index = 0

        def __iter__(self):
            print('Calling CitiyIterator instance __iter__')
            return self

        def __next__(self):
            print('Calling __next__')
            if self._index >= len(self._city_obj):
                raise StopIteration
            else:
                item = self._city_obj._cities[self._index]
                self._index += 1
                return item

In [None]:
cities = Cities()

In [None]:
cities[0]

getting item...


'New York'

In [None]:
next(iter(cities))

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__


'New York'

In [None]:
next(iter(cities))

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__


'New York'

Now that Cities is both a sequence type (`__getitem__`) and an iterable (`__iter__`), when we loop over `cities`, is Python going to use `__getitem__` or `__iter__`?

In [None]:
cities = Cities()
for city in cities:
    print(city)

Calling Cities instance __iter__
Calling CityIterator __init__
Calling __next__
New York
Calling __next__
Newark
Calling __next__
New Delhi
Calling __next__
Newcastle
Calling __next__


It uses the iterator - so Python will use the iterator if there is one, otherwise it will fall back to using `__getitem__`. If neither is implemented, we'll get an exception.

Of course, for selection by index or slice, the `__getitem__` method **must** be implemented.

We'll come back to this very topic in an upcoming video, because behind the scenes, even if we only implement the `__getitem__` method, Python will auto-generate an iterator for us!

In [None]:
l = [1, 2, 3]

In [None]:
iter_l = iter(l)

In [None]:
type(iter_l)

list_iterator

In [None]:
next(iter_l)

1

In [None]:
next(iter_l)

2

In [None]:
next(iter_l)

3

In [None]:
next(iter_l)

StopIteration: 

In [None]:
id(iter_l), id(iter(iter_l))

(140067578844560, 140067578844560)

In [None]:
'__next__' in dir(iter_l)

True

In [None]:
'__iter__' in dir(iter_l)

True

In [None]:
'__iter__' in dir(l)

True

In [None]:
'__next__' in dir(l)

False

Of course, since lists are also sequence types, they also implement the `__getitem__` method:

In [None]:
'__getitem__' in dir(l)

True

In [None]:
'__getitem__' in dir(set)

False

In [None]:
'__iter__' in dir(set)

True

In [None]:
s = {1, 2, 3}
'__next__' in dir(iter(s))

True

In [None]:
'__iter__' in dir(dict)

True

In [None]:
d = dict(a=1, b=2, c=3)

In [None]:
iter_d = iter(d)

In [None]:
next(iter_d)

'a'

In [None]:
iter_vals = iter(d.values())

In [None]:
next(iter_vals)

1

In [None]:
iter_items = iter(d.items())

In [None]:
next(iter_items)

('a', 1)

### Consuming Iterators Manually

In [None]:
s = 'I sleep all night, and I work all day'

In [None]:
iter_s = iter(s)

In [None]:
print(next(iter_s))
print(next(iter_s))
print(next(iter_s))
print(next(iter_s))
print(next(iter_s))

I
 
s
l
e


In [None]:
# https://deepnote.com/project/e2e63abc-dd49-495f-9665-644c21dd9cdb#%2Fcars.csv
with open('cars.csv') as file:
    for line in file:
        print(line)   

Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin

STRING;DOUBLE;INT;DOUBLE;DOUBLE;DOUBLE;DOUBLE;INT;CAT

Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US

Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US

Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US

AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US

Ford Torino;17.0;8;302.0;140.0;3449.;10.5;70;US

Ford Galaxie 500;15.0;8;429.0;198.0;4341.;10.0;70;US

Chevrolet Impala;14.0;8;454.0;220.0;4354.;9.0;70;US

Plymouth Fury iii;14.0;8;440.0;215.0;4312.;8.5;70;US

Pontiac Catalina;14.0;8;455.0;225.0;4425.;10.0;70;US

AMC Ambassador DPL;15.0;8;390.0;190.0;3850.;8.5;70;US

Citroen DS-21 Pallas;0;4;133.0;115.0;3090.;17.5;70;Europe

Chevrolet Chevelle Concours (sw);0;8;350.0;165.0;4142.;11.5;70;US

Ford Torino (sw);0;8;351.0;153.0;4034.;11.0;70;US

Plymouth Satellite (sw);0;8;383.0;175.0;4166.;10.5;70;US

AMC Rebel SST (sw);0;8;360.0;175.0;3850.;11.0;70;US

Dodge Challenger SE;15.0;8;383.0;170.

Here's what we want to do: 
* read the first line to get the column headers and create a named tuple class
* read data types from second line and store this so we can cast the strings we are reading to the correct data type
* read the data rows and parse them into a named tuples

In [None]:
with open('cars.csv') as file:
    row_index = 0
    for line in file:
        if row_index == 0:
            # header row
            headers = line.strip('\n').split(';')
            print(headers)
        elif row_index == 1:
            # data type row
            data_types = line.strip('\n').split(';')
            print(data_types)
        else:
            # data rows
            data = line.strip('\n').split(';')
            print(data)
        row_index += 1

['Car', 'MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight', 'Acceleration', 'Model', 'Origin']
['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT']
['Chevrolet Chevelle Malibu', '18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', 'US']
['Buick Skylark 320', '15.0', '8', '350.0', '165.0', '3693.', '11.5', '70', 'US']
['Plymouth Satellite', '18.0', '8', '318.0', '150.0', '3436.', '11.0', '70', 'US']
['AMC Rebel SST', '16.0', '8', '304.0', '150.0', '3433.', '12.0', '70', 'US']
['Ford Torino', '17.0', '8', '302.0', '140.0', '3449.', '10.5', '70', 'US']
['Ford Galaxie 500', '15.0', '8', '429.0', '198.0', '4341.', '10.0', '70', 'US']
['Chevrolet Impala', '14.0', '8', '454.0', '220.0', '4354.', '9.0', '70', 'US']
['Plymouth Fury iii', '14.0', '8', '440.0', '215.0', '4312.', '8.5', '70', 'US']
['Pontiac Catalina', '14.0', '8', '455.0', '225.0', '4425.', '10.0', '70', 'US']
['AMC Ambassador DPL', '15.0', '8', '390.0', '190.0', '3850.', '8.5', '70', 'US']
[

In [None]:
from collections import namedtuple
cars = []

with open('cars.csv') as file:
    row_index = 0
    for line in file:
        if row_index == 0:
            # header row
            headers = line.strip('\n').split(';')
            Car = namedtuple('Car', headers)
        elif row_index == 1:
            # data type row
            data_types = line.strip('\n').split(';')
            print(data_types)
        else:
            # data rows
            data = line.strip('\n').split(';')
            car = Car(*data)
            cars.append(car)
        row_index += 1

['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT']


In [None]:
print(cars[0])

Car(Car='Chevrolet Chevelle Malibu', MPG='18.0', Cylinders='8', Displacement='307.0', Horsepower='130.0', Weight='3504.', Acceleration='12.0', Model='70', Origin='US')


In [None]:
def cast(data_type, value):
    if data_type == 'DOUBLE':
        return float(value)
    elif data_type == 'INT':
        return int(value)
    else:
        return str(value)

In [None]:
data_types = ['STRING', 'DOUBLE', 'INT', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'DOUBLE', 'INT', 'CAT']

In [None]:
data_row = ['Chevrolet Chevelle Malibu', '18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', 'US']

In [None]:
list(zip(data_types, data_row))

[('STRING', 'Chevrolet Chevelle Malibu'),
 ('DOUBLE', '18.0'),
 ('INT', '8'),
 ('DOUBLE', '307.0'),
 ('DOUBLE', '130.0'),
 ('DOUBLE', '3504.'),
 ('DOUBLE', '12.0'),
 ('INT', '70'),
 ('CAT', 'US')]

In [None]:
[cast(data_type, value) for data_type, value in zip(data_types, data_row)]

['Chevrolet Chevelle Malibu', 18.0, 8, 307.0, 130.0, 3504.0, 12.0, 70, 'US']

In [None]:
def cast_row(data_types, data_row):
    return [cast(data_type, value) 
            for data_type, value in zip(data_types, data_row)]

In [None]:
from collections import namedtuple
cars = []

with open('cars.csv') as file:
    row_index = 0
    for line in file:
        if row_index == 0:
            # header row
            headers = line.strip('\n').split(';')
            Car = namedtuple('Car', headers)
        elif row_index == 1:
            # data type row
            data_types = line.strip('\n').split(';')
        else:
            # data rows
            data = line.strip('\n').split(';')
            data = cast_row(data_types, data)
            car = Car(*data)
            cars.append(car)
        row_index += 1

In [None]:
cars[0]

Car(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US')

In [None]:
from collections import namedtuple
cars = []

with open('cars.csv') as file:
    file_iter = iter(file)
    headers = next(file_iter).strip('\n').split(';')
    Car = namedtuple('Car', headers)
    data_types = next(file_iter).strip('\n').split(';')
    for line in file_iter:
        data = line.strip('\n').split(';')
        data = cast_row(data_types, data)
        car = Car(*data)
        cars.append(car)

In [None]:
cars[0]

Car(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US')

In [None]:
from collections import namedtuple

with open('cars.csv') as file:
    file_iter = iter(file)
    headers = next(file_iter).strip('\n').split(';')
    data_types = next(file_iter).strip('\n').split(';')
    cars_data = [cast_row(data_types, 
                          line.strip('\n').split(';'))
                   for line in file_iter]
    cars = [Car(*item) for item in cars_data]

In [None]:
cars_data[0]

['Chevrolet Chevelle Malibu', 18.0, 8, 307.0, 130.0, 3504.0, 12.0, 70, 'US']

In [None]:
cars[0]

Car(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US')

In [None]:
from collections import namedtuple

with open('cars.csv') as file:
    file_iter = iter(file)
    headers = next(file_iter).strip('\n').split(';')
    data_types = next(file_iter).strip('\n').split(';')
    cars = [Car(*cast_row(data_types, 
                          line.strip('\n').split(';')))
            for line in file_iter]


In [None]:
cars[0]

Car(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US')

Here's an example - suppose we have a loop that iterates over some range of integers. As we loop through those integers we want to create a tuple containing the integer and a string that cycles over a finite set (smaller than the list of integers).

```
1, 2, 3, 4, 5, 6, 7, 8, 9, ...

N, S, W, E
```

and we want to generate

```
1N, 2S, 3W, 4E, 5N, 6S, 7W, 8E, 9N, ...
```


In [None]:
class CyclicIterator:
    def __init__(self, lst):
        self.lst = lst
        self.i = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        result = self.lst[self.i % len(self.lst)]
        self.i += 1
        return result

In [None]:
iter_cycl = CyclicIterator('NSWE')

In [None]:
for i in range(10):
    print(next(iter_cycl))

N
S
W
E
N
S
W
E
N
S


In [None]:
n = 10
iter_cycl = CyclicIterator('NSWE')
for i in range(1, n+1):
    direction = next(iter_cycl)
    print(f'{i}{direction}')

1N
2S
3W
4E
5N
6S
7W
8E
9N
10S


In [None]:
n = 10
iter_cycl = CyclicIterator('NSWE')
[f'{i}{next(iter_cycl)}' for i in range(1, n+1)]

['1N', '2S', '3W', '4E', '5N', '6S', '7W', '8E', '9N', '10S']

In [None]:
n = 10
list(zip(range(1, n+1), 'NSWE' * (n//4 + 1)))

[(1, 'N'),
 (2, 'S'),
 (3, 'W'),
 (4, 'E'),
 (5, 'N'),
 (6, 'S'),
 (7, 'W'),
 (8, 'E'),
 (9, 'N'),
 (10, 'S')]

In [None]:
[f'{i}{direction}'
 for i, direction in zip(range(1, n+1), 'NSWE' * (n//4 + 1))]

['1N', '2S', '3W', '4E', '5N', '6S', '7W', '8E', '9N', '10S']

In [None]:
import itertools

In [None]:
n = 10
iter_cycl = CyclicIterator('NSWE')
[f'{i}{next(iter_cycl)}' for i in range(1, n+1)]

['1N', '2S', '3W', '4E', '5N', '6S', '7W', '8E', '9N', '10S']

In [None]:
n = 10
iter_cycl = itertools.cycle('NSWE')
[f'{i}{next(iter_cycl)}' for i in range(1, n+1)]

['1N', '2S', '3W', '4E', '5N', '6S', '7W', '8E', '9N', '10S']

### Lazy Iterables

An iterable is an object that can return an iterator (`__iter__`).

In turn an iterator is an object that can return itself (`__iter__`), and return the next value when asked (`__next__`).

Nothing in all this says that the iterable needs to be a finite collection, or that the elements in the iterable need to be materialized (pre-created) at the time the iterable / iterator is created.

Lazy evaluation is when evaluating a value is deferred until it is actually requested.

It is not specific to iterables however.

Simple examples of lazy evaluation are often seen in classes for calculated properties.

In [None]:
import math

class Circle:
    def __init__(self, r):
        self.radius = r
        
    @property
    def radius(self):
        return self._radius
    
    @radius.setter
    def radius(self, r):
        self._radius = r
        self.area = math.pi * r**2

In [None]:
c = Circle(1)

In [None]:
c.area

3.141592653589793

In [None]:
c.radius = 2

In [None]:
c.radius, c.area

(2, 12.566370614359172)

In [None]:
class Circle:
    def __init__(self, r):
        self.radius = r
        
    @property
    def radius(self):
        return self._radius
    
    @radius.setter
    def radius(self, r):
        self._radius = r

    @property
    def area(self):
        print('executing')
        return math.pi * self.radius ** 2

In [None]:
c = Circle(1)

In [None]:
c.area

executing


3.141592653589793

In [None]:
c.radius = 2

In [None]:
c.area

executing


12.566370614359172

In [None]:
c.area

executing


12.566370614359172

In [None]:
class Circle:
    def __init__(self, r):
        self.radius = r
        
    @property
    def radius(self):
        return self._radius
    
    @radius.setter
    def radius(self, r):
        self._radius = r
        self._area = None

    @property
    def area(self):
        if self._area is None:
            print('Calculating area...')
            self._area = math.pi * self.radius ** 2
        print('called')
        return self._area

In [None]:
c = Circle(1)

In [None]:
c.area

Calculating area...
called


3.141592653589793

In [None]:
c.area

called


3.141592653589793

In [None]:
c.radius = 2

In [None]:
c.area

Calculating area...
called


12.566370614359172

In [None]:
c.area

called


12.566370614359172

In [None]:
class Factorials:
    def __iter__(self):
        return self.FactIter()
    
    class FactIter:
        def __init__(self):
            self.i = 0
            
        def __iter__(self):
            return self
        
        def __next__(self):
            result = math.factorial(self.i)
            self.i += 1
            return result

In [None]:
factorials = Factorials()
fact_iter = iter(factorials)

for _ in range(10):
    print(next(fact_iter))

1
1
2
6
24
120
720
5040
40320
362880
