# DS-GA 1019

# Lecture 3 - The itertools Module
## Feb. 8, 2021
see: http://alberto.bietti.me/python-itertools/

# Before we start

* Questions about previous topics?
* Questions about homework?
* Any other feedback?

# This class

- The concept of iterators
- Infinite Iterators
- Finite Iterators
- Combinatoric Generators

## Arrays/lists vs iterators

- When creating an array (as well as a list or tuple), we first have to allocate a block of system memory
to store these elments.  

- When we only need to iterate through a list, there might be no need to store it



---

The __itertools__ module implements a number of iterator building blocks that provide fast, memory efficient tools.


A generator behaves like a list of values, with some **important differences**:

- The values are generated on demand (sequence is not stored in memory)
- The values can only be accessed in sequence (not like an array)
- The values can only be accessed once



An iterator is an object that provides two methods:

- __iter__ which returns the iterator itself
- __next__ which returns the next value from the iterator



In [2]:
it = iter('abcdef')
print(it.__next__())    # a
print(next(it))         # b
print([e for e in it])  # the rest: ['c','d','e','f']
print(it.__next__())    # ERROR because finish iteration

a
b
['c', 'd', 'e', 'f']


StopIteration: 

In [1]:
it = iter('advanced_python')

# first 8 elements
print(next(it))
print(next(it))
print(next(it))
print(next(it))
print(next(it))
print(next(it))
print(next(it))

# the rest
lst = [e for e in it]
print("\nlst: %s" % lst)

# at this point the iterator is completed
print(it.__next__())

a
d
v
a
n
c
e

lst: ['d', '_', 'p', 'y', 't', 'h', 'o', 'n']


StopIteration: 

In [2]:
it = iter('advanced_python')
print(list(it))
    
# At this point the iterator is finished
print(list(it))

['a', 'd', 'v', 'a', 'n', 'c', 'e', 'd', '_', 'p', 'y', 't', 'h', 'o', 'n']
[]


In [12]:
#enumerate() method adds a counter to an iterable 
it = iter('advanced_python')

for i, elem in enumerate(it): # idx & ele
    print(i, elem)


0 a
1 d
2 v
3 a
4 n
5 c
6 e
7 d
8 _
9 p
10 y
11 t
12 h
13 o
14 n


In [13]:
# Additionally look at: 
for i in range(10): 
    print(i)
    # some_work(i)

0
1
2
3
4
5
6
7
8
9


---

### range() vs xrange()

#### In Python2: 

- range(): returns a list  
- xrange(): returns a generator object (Prof: not generator)
    
#### In Python3: 
- there is no xrange, and range returns a generator (Prof: not generator)

In [14]:
# returns a list

def demo_range(start, stop, step=1):
    numbers = []
    while start < stop:
        numbers.append(start)
        start += step
    return numbers

---

- Note that the range implementation must create/allocate the list of all numbers within the range.

- We use memory for "all numbers".


In [6]:
# returns a generator

def demo_xrange(start, stop, step=1):
    while start < stop:
        yield start      # -> yield gives a generator (check online)
        start += step
        
# We do not precreate the list of all numbers within the range.
# We do not use memory for "all numbers".


----

- The generator is able to 'return' many values. 

- Every time the code gets to the yield, the function emits its value

- when another value is requested the function resumes running (maintaining its previous state) and emits the new value.

In [21]:
# how many numbers in list_of_numbers are divisible by 3?

list_of_numbers = demo_range(0, 1000)
divisible_by_three = len([n for n in list_of_numbers if n % 3 == 0])

print("divisible_by_three:", divisible_by_three)

divisible_by_three: 334


In [20]:
# how many numbers in list_of_numbers are divisible by 3?

list_of_numbers = demo_xrange(0, 1000)
divisible_by_three = len([n for n in list_of_numbers if n % 3 == 0])

print("divisible_by_three:", divisible_by_three)

divisible_by_three: 334


## List vs generator comprehensions

List:

```[ <value> for <item> in <sequence> if <condition> ]```

Generator:

```( <value> for <item> in <sequence> if <condition> )```

In [31]:
(i for i in list(range(10)))

<generator object <genexpr> at 0x7fda880485f0>

In [32]:
generator = (1 for n in list_of_numbers if n % 3 == 0)
print("generator:", generator)
divisible_by_three = sum(generator)
print(divisible_by_three)

#Here, we have a generator that emits a value of 1 whenever it encounters a number divisible by 3, 
# and nothing otherwise.

generator: <generator object <genexpr> at 0x7fdaa82aa890>
334


In [4]:
# at this point, generator is completed
generator = (x for x in range(3))
for e in generator:
    print(e)

sum(generator)

0
1
2
3
4


0

In [14]:
import sys
x1 = (0,1,2)
print(sys.getsizeof(x1))
sys.getsizeof((x for x in range(3)))

64


tuple

## Infinite Iterators
__itertools__ package comes with three iterators that can iterate infinitely.
- useful for generating numbers or cycling over iterables of unknown length
- infinite iterators need to be stopped

#### itertools.count(start,step)  
https://docs.python.org/3/library/itertools.html#itertools.count

In [3]:
from itertools import count
# count(start, step)
for i in count(10, 3):  # infinite
    print(i)
    if i >= 30: break
        

10
13
16
19
22
25
28
31


#### itertools.islice(seq, [start,] stop [, step])   

https://docs.python.org/3/library/itertools.html#itertools.islice

Make an iterator that returns selected elements from the iterable.  


In [4]:
from itertools import islice  # slice the iterator
# islice(iter_obj, num of ele)
print(list(islice(count(10, 3), 5)))     # first 5 elements 
print(list(islice(count(10, 3), 10)))    # first 10 elements 
print(list(islice(count(10, 3), 5, 10))) # second 5 elements: from 5th to 9th element included

[10, 13, 16, 19, 22]
[10, 13, 16, 19, 22, 25, 28, 31, 34, 37]
[25, 28, 31, 34, 37]


In [39]:
# itertools.islice(seq, [start,] stop [,step])

print(list(islice(count(10, 3), 10)))       # first 10 elements 
print(list(islice(count(10, 3), 3, 10, 2))) # from 3rd to 9th element (including those), every second element 

[10, 13, 16, 19, 22, 25, 28, 31, 34, 37]
[19, 25, 31, 37]


**None**: "If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position."

In [5]:
# print(list(islice('ABCDEFG', 2, None)))
print(list(islice('ABCDEFG', 100))) # and len('ABCDEFG') = 7
# -> only 7 because exceed the range

['A', 'B', 'C', 'D', 'E', 'F', 'G']


#### itertools.cycle(seq)

https://docs.python.org/3/library/itertools.html#itertools.cycle

In [44]:
from itertools import cycle

print(list(islice(cycle('abc'), 12)))  # cycle through abs until len=12

['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']


In [45]:
# iterators can be used in different ways
lst = ['advanced','python','for','data','science']
print(list(islice(cycle(lst), 10)))

['advanced', 'python', 'for', 'data', 'science', 'advanced', 'python', 'for', 'data', 'science']


#### itertools.repeat(elem [,times])

https://docs.python.org/3/library/itertools.html#itertools.repeat

In [46]:
from itertools import repeat

# repeat an object: e.g. string

print(list(repeat('abcde', 5)))

['abcde', 'abcde', 'abcde', 'abcde', 'abcde']


In [47]:
# repeat an object: e.g. list
print(list(repeat(['CDS', 'Courant', 'NYU'], 5)))


[['CDS', 'Courant', 'NYU'], ['CDS', 'Courant', 'NYU'], ['CDS', 'Courant', 'NYU'], ['CDS', 'Courant', 'NYU'], ['CDS', 'Courant', 'NYU']]


## Finite Iterators
itertools also has a number of iterators that terminate.

#### itertools.accumulate(seq [, func])

In [48]:
from itertools import accumulate
# accumulate all the iterators together
print(list(accumulate(range(1, 11)))) # 1, 1 + 2, 1 + 2 + 3, 1 + 2 + 3 + 4,...

[1, 3, 6, 10, 15, 21, 28, 36, 45, 55]


In [49]:
import operator

print(list(accumulate(range(1, 11), operator.mul))) # 1, 1 * 2, 1 * 2 * 3, 1 * 2 * 3 * 4

[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]


In [51]:
# it can also handle non-numeric lists such as strings

print(list(accumulate('cds_nyu')))

print(list(accumulate(repeat('python ', 3))))

['c', 'cd', 'cds', 'cds_', 'cds_n', 'cds_ny', 'cds_nyu']
['python ', 'python python ', 'python python python ']


More examples here: https://docs.python.org/3/library/itertools.html#itertools.accumulate

#### itertools.chain(*iterables)  

In [54]:
my_list = ['foo', 'bar']
cmd = ['ls', '/some/dir']
numbers = list(range(5))
my_list.extend(cmd)
my_list.extend(numbers)

print(my_list)

['foo', 'bar', 'ls', '/some/dir', 0, 1, 2, 3, 4]


The chain iterator takes a series of iterables and flattens them down into one long iterable.

In [59]:
from itertools import chain

my_list = list(chain(['foo', 'bar'], cmd, numbers))

print(my_list)

['foo', 'bar', 'ls', '/some/dir', 0, 1, 2, 3, 4]


In [60]:
my_list = list(chain('foo', 'bar'))

print(my_list)

['f', 'o', 'o', 'b', 'a', 'r']


#### itertools.compress(seq, selectors)
Useful for filtering an iterable using a second boolean iterable (i.e. an indicator)

https://docs.python.org/3/library/itertools.html#itertools.compress


In [62]:
from itertools import compress

letters = 'ABCDEFGHIJKJM'
bools = [True, True, False, True, False, True, True, True, False, True]
print(len(letters), len(bools))

# notice the sizes do not need to match
print(list(compress(letters, bools)))    # use the bool to filter the letters, size can diff

13 10
['A', 'B', 'D', 'F', 'G', 'H', 'J']


In [63]:
def either_aeiou(e):
#     return (e == 'A') or (e == 'E') or (e == 'I') or (e == 'O') or (e == 'U')
    return e in 'AEIOU'

print(list(compress(letters, [either_aeiou(e) for e in letters])))

['A', 'E', 'I']


---

### itertools.dropwhile(predicate, seq)  

- Drop the elements while the predicate is True; afterwards, returns every element.

### itertools.takewhile(predicate, seq)  

- Take the elements while the predicate is True

In [8]:
from itertools import dropwhile, takewhile

print(list(dropwhile(lambda x: x > 5, [6, 7, 8, 9, 10, 1, 2, 3, 20]))) 
# remove when T, when 1st F -> returns the ele

print(list(takewhile(lambda x: x > 5, [6, 7, 8, 9, 10, 1, 2, 3, 20]))) 
# keep when T, when 1st F -> remove the rest

[1, 2, 3, 20]
[6, 7, 8, 9, 10]


In [66]:
lst = ['parrot', 'pelican', 'lion', 'cat', 'panther', 'dolphin', 'dog']

print("lst_take_while:", list(takewhile(lambda word: word[0] == 'p', lst)))

print("lst_drop_while:", list(dropwhile(lambda word: word[0] == 'p', lst)))


lst_take_while: ['parrot', 'pelican']
lst_drop_while: ['lion', 'cat', 'panther', 'dolphin', 'dog']


In [10]:
lst=[[1,5],[4,3],[2,2]]
lst.sort(key=lambda x:x[1])
# lst.sort()
lst

[[2, 2], [4, 3], [1, 5]]

#### itertools.filterfalse(predicate, seq) 

https://docs.python.org/3/library/itertools.html#itertools.ifilterfalse

Filter elements to keep only those for which the predicate is False (opposite behavior of `filter`).

In [69]:
from itertools import filterfalse

# filter *all* elements for which the predicate is false

print(list(filterfalse(lambda x: x < 5, [6, 7, 8, 9, 10, 1, 2, 3, 20])))

[6, 7, 8, 9, 10, 20]


In [70]:
# contrast with the standard function filter
list(filter(lambda x: x < 5, [6, 7, 8, 9, 10, 1, 2, 3, 20]))

[1, 2, 3]

#### itertools.groupby(seq, key=None)  
https://docs.python.org/3/library/itertools.html#itertools.groupby  
Make an iterator that returns consecutive keys and groups from the iterable.  
`key` is a function computing a key value for each element.

In [82]:
from itertools import groupby
 
numbers = range(20)
# to group consecutive numbers from range(20) by the same **quotient** when divided by 5; 
# i.e. for x in range(20) group by x // 5

for (key, group) in groupby(numbers, lambda x: x // 5):
    print(key, list(group))

0 [0, 1, 2, 3, 4]
1 [5, 6, 7, 8, 9]
2 [10, 11, 12, 13, 14]
3 [15, 16, 17, 18, 19]


In [80]:
import random

from itertools import groupby
 
numbers = list(range(20))
random.shuffle(numbers)
# to group consecutive numbers from range(20) by the same **quotient** when divided by 5; 
# i.e. for x in range(20) group by x // 5

for (key, group) in groupby(numbers, lambda x: x // 5):
    print(key, list(group))

0 [0]
1 [7, 6]
3 [16]
0 [2]
1 [8, 5, 9]
3 [17, 19]
2 [12]
0 [1]
2 [11, 10]
3 [15]
0 [4]
2 [14]
0 [3]
2 [13]
3 [18]


In [83]:
# group the list of elements by the same **remainder** 
for (key, group) in groupby(numbers, lambda x: x % 5):
    print(key, list(group))

0 [0]
1 [1]
2 [2]
3 [3]
4 [4]
0 [5]
1 [6]
2 [7]
3 [8]
4 [9]
0 [10]
1 [11]
2 [12]
3 [13]
4 [14]
0 [15]
1 [16]
2 [17]
3 [18]
4 [19]


In [84]:
# combining the groups with same key
import collections
groups = collections.defaultdict(list)

for (key, group) in groupby(numbers, lambda x: x % 5):
    groups[key] += group

for key, group in groups.items():
    print(key, group)

0 [0, 5, 10, 15]
1 [1, 6, 11, 16]
2 [2, 7, 12, 17]
3 [3, 8, 13, 18]
4 [4, 9, 14, 19]


In [87]:
keyfn = lambda x: x % 5

for (key, group) in groupby(sorted(numbers, key=keyfn), keyfn): # sorted need to return to keep change
    print(key, list(group))

0 [0, 5, 10, 15]
1 [1, 6, 11, 16]
2 [2, 7, 12, 17]
3 [3, 8, 13, 18]
4 [4, 9, 14, 19]


#### itertools.starmap(function, seq)
https://docs.python.org/3/library/itertools.html#itertools.starmap

Iterator that computes the function using arguments obtained from the iterable. 

In [88]:
from itertools import starmap

# here is the iterable: [(1,2), (3,4), (5,6)]

for item in starmap(lambda u, v: u + v, [(1, 2), (3, 4), (5, 6)]):
    print(item)
    
# map vs. starmap
# starmap is iterable

3
7
11


In [89]:
def add(u, v, w):
    return u + v + w

for item in starmap(add, [(1, 2, 3), (3, 4, 5), (5, 6, 7)]):
    print(item)

6
12
18


#### itertools.tee(seq, n=2)
Creates n iterators from the given sequence

In [98]:
from itertools import tee
data = 'ABCDE'
iters = tee(data, 5)  # create 5 same iterator "data" 
for i in range(5):
    print(f'iterator:{i}')
    for item in iters[i]:
        print(item, end="")
    print("\n")

iterator:0
ABCDE

iterator:1
ABCDE

iterator:2
ABCDE

iterator:3
ABCDE

iterator:4
ABCDE



#### itertools.zip_longest(*seq, fillvalue=None)  
https://docs.python.org/3/library/itertools.html#itertools.izip_longest  
An iterator that aggregates elements from each of the iterables.  

In [99]:
# The standard zip stops after the SHORTEST list is finished
list(zip('AB', 'xyzw', range(5)))

[('A', 'x', 0), ('B', 'y', 1)]

In [100]:
from itertools import zip_longest

print(list(zip_longest('AB', 'xyzw', fillvalue='_')))

[('A', 'x'), ('B', 'y'), ('_', 'z'), ('_', 'w')]


In [101]:
print(list(zip_longest('AB', 'xyzw', range(5), fillvalue='_')))

[('A', 'x', 0), ('B', 'y', 1), ('_', 'z', 2), ('_', 'w', 3), ('_', '_', 4)]


In [102]:
# useful to create dictionaries

vals = ['pqr', 'uvw', 'xyz']

dc = dict(zip_longest('1234567', vals, fillvalue='blank_value'))
print("dict:", dc)

dict: {'1': 'pqr', '2': 'uvw', '3': 'xyz', '4': 'blank_value', '5': 'blank_value', '6': 'blank_value', '7': 'blank_value'}


## Combinatoric Generators
Iterators that can be used for creating combinations and permutations of data

#### itertools.combinations(seq, r) 
#### itertools.combinations_with_replacement(seq, r)
#### itertools.permutations(iterable, r=None)

---

E.g. There is an urn with four balls: yellow, green, red, and blue.  

Q: In how many ways can one pick two (three) balls without replacement?  

Q: In how many ways can one pick two (three) balls with replacement?

In [105]:
from itertools import combinations

for item in combinations('RBGY', 2):
    print(''.join(item), end="; ")

RB; RG; RY; BG; BY; GY; 

In [19]:
from itertools import combinations, combinations_with_replacement, permutations
seq=["a","b","c"]
r=2
list(combinations(seq, r))
# list(combinations_with_replacement(seq, r))
# list(permutations(seq, r=None))

[('a', 'b'), ('a', 'c'), ('b', 'c')]

In [106]:
from itertools import combinations_with_replacement

for item in combinations_with_replacement('RBGY', 2):
    print(''.join(item), end="; ")

RR; RB; RG; RY; BB; BG; BY; GG; GY; YY; 

In [107]:
from itertools import permutations

for item in permutations('RBGY', 2):
    print(''.join(item), end="; ")

RB; RG; RY; BR; BG; BY; GR; GB; GY; YR; YB; YG; 

#### itertools.product(*seq, repeat=1)
Produces the **cartesian product** of sequences

In [112]:
from itertools import product

arrays = [('A', 'B'), ('a', 'b', 'c')]
cart_prod = list(product(*arrays))
print(cart_prod)

# size (cardinality) is 2 * 3  = 6


[('A', 'a'), ('A', 'b'), ('A', 'c'), ('B', 'a'), ('B', 'b'), ('B', 'c')]


In [118]:
# Calling a function with all possible choices of parameters

choices = [(-1,1), (-3,3), (-5,5)]
cp = list(product(*choices))
print(cp)

def f(a, b, c):
    print('(', a, b, c, ')')
    
for args in cp:
    f(*args)

[(-1, -3, -5), (-1, -3, 5), (-1, 3, -5), (-1, 3, 5), (1, -3, -5), (1, -3, 5), (1, 3, -5), (1, 3, 5)]
( -1 -3 -5 )
( -1 -3 5 )
( -1 3 -5 )
( -1 3 5 )
( 1 -3 -5 )
( 1 -3 5 )
( 1 3 -5 )
( 1 3 5 )


In [143]:
choices = {
    'a': [-1, 1],
    'b': [-3, 3],
    'c': [-5, 5],
}

def f(**kwargs):
    print(kwargs)

all_dicts = list(dict(zip(choices.keys(), vals)) for vals in product(*choices.values()))
for d in all_dicts:
    f(**d)

{'a': -1, 'b': -3, 'c': -5}
{'a': -1, 'b': -3, 'c': 5}
{'a': -1, 'b': 3, 'c': -5}
{'a': -1, 'b': 3, 'c': 5}
{'a': 1, 'b': -3, 'c': -5}
{'a': 1, 'b': -3, 'c': 5}
{'a': 1, 'b': 3, 'c': -5}
{'a': 1, 'b': 3, 'c': 5}


In [153]:
def f(**kwargs):
    print(kwargs)

def g(kwargs):
    print(kwargs)
    
def h(*args):
    print(args)
    
f(a=1, b=2, c=3)
f(**{'a' : 1, 'b' : 2, 'c' : 3})
# g(a=1, b=2, c=3)

h(*{'a' : 1, 'b' : 2, 'c' : 3})

{'a': 1, 'b': 2, 'c': 3}
{'a': 1, 'b': 2, 'c': 3}
('a', 'b', 'c')


In [155]:
import numpy as np

M = np.zeros((2, 2))
cids = product((0, 1), (0, 1))

print("cids:", cids)

for i, e in enumerate(cids):
    M[e] = i * 100
    print("i = %s, e = %s" % (i, e))

print("\nM=\n", M)

cids: <itertools.product object at 0x7fda88432e00>
i = 0, e = (0, 0)
i = 1, e = (0, 1)
i = 2, e = (1, 0)
i = 3, e = (1, 1)

M=
 [[  0. 100.]
 [200. 300.]]


In [156]:
import numpy as np

M = np.zeros((3, 3, 3))
cids = product((0,1,2), (0,1,2), (0,1,2))


print("cids: ", cids)

for i, e  in enumerate(cids):
    M[e] = i * 100
    print(f"i = {i:>2}, e = {e}")

print("\nM=\n", M)

cids:  <itertools.product object at 0x7fda88436ac0>
i =  0, e = (0, 0, 0)
i =  1, e = (0, 0, 1)
i =  2, e = (0, 0, 2)
i =  3, e = (0, 1, 0)
i =  4, e = (0, 1, 1)
i =  5, e = (0, 1, 2)
i =  6, e = (0, 2, 0)
i =  7, e = (0, 2, 1)
i =  8, e = (0, 2, 2)
i =  9, e = (1, 0, 0)
i = 10, e = (1, 0, 1)
i = 11, e = (1, 0, 2)
i = 12, e = (1, 1, 0)
i = 13, e = (1, 1, 1)
i = 14, e = (1, 1, 2)
i = 15, e = (1, 2, 0)
i = 16, e = (1, 2, 1)
i = 17, e = (1, 2, 2)
i = 18, e = (2, 0, 0)
i = 19, e = (2, 0, 1)
i = 20, e = (2, 0, 2)
i = 21, e = (2, 1, 0)
i = 22, e = (2, 1, 1)
i = 23, e = (2, 1, 2)
i = 24, e = (2, 2, 0)
i = 25, e = (2, 2, 1)
i = 26, e = (2, 2, 2)

M=
 [[[   0.  100.  200.]
  [ 300.  400.  500.]
  [ 600.  700.  800.]]

 [[ 900. 1000. 1100.]
  [1200. 1300. 1400.]
  [1500. 1600. 1700.]]

 [[1800. 1900. 2000.]
  [2100. 2200. 2300.]
  [2400. 2500. 2600.]]]
