# DS-GA 3001

# Lecture 3 - The itertools Module
## Feb. 18, 2021
see: https://nyu-cds.github.io/python-itertools/ (working link [here](http://alberto.bietti.me/python-itertools/))

# Before we start

* TA office hours: do current slots work?
* Questions about previous topics?
* Questions about homework?
* Any other feedback?

# This class

- The concept of iterators
- Infinite Iterators
- Finite Iterators
- Combinatoric Generators

## Arrays/lists vs iterators

- When creating an array (as well as a list or tuple), we first have to allocate a block of system memory
to store these elments.  

- When we only need to iterate through a list, there might be no need to store it



---

The __itertools__ module implements a number of iterator building blocks that provide fast, memory efficient tools.


An iterator/generator behaves like a list of values, with some **important differences**:

- The values are generated on demand (sequence is stored in memory)
- The values can only be accessed in sequence (not like an array)
- The values can only be accessed once



An iterator is an object that provides two methods:

- __iter__ which returns the iterator itself
- __next__ which returns the next value from the iterator



In [3]:
it = iter('advanced_python')
next(it)

'a'

In [None]:
it = iter('advanced_python')

# first 8 elements
print(it.__next__())
print(it.__next__())
print(it.__next__())
print(it.__next__())
print(it.__next__())
print(it.__next__())
print(next(it))

# the rest
lst = [e for e in it]
print("\nlst: %s" % lst)

# at this point the iterator is completed
print(it.__next__())

In [None]:
it = iter('advanced_python')
print(list(it))
    
# At this point the iterator is finished
print(list(it))

In [None]:
#enumerate() method adds a counter to an iterable 
it = iter('advanced_python')

for i, elem in enumerate(it): 
    print(i, elem)

In [None]:
# Additionally look at: 
for i in range(10): 
    print(i)
    # some_work(i)

---

### range() vs xrange()

#### In Python2: 

- range(): returns a list  
- xrange(): returns a generator object
    
#### In Python3: 
- there is no xrange, and range returns a generator 

In [None]:
# returns a list

def demo_range(start, stop, step=1):
    numbers = []
    while start < stop:
        numbers.append(start)
        start += step
    return numbers

---

- Note that the range implementation must create/allocate the list of all numbers within the range.

- We use memory for "all numbers".


In [None]:
# returns a generator

def demo_xrange(start, stop, step=1):
    while start < stop:
        yield start
        start += step
        
# We do not precreate the list of all numbers within the range.
# We do not use memory for "all numbers".


----

- The generator is able to 'return' many values. 

- Every time the code gets to the yield, the function emits its value

- when another value is requested the function resumes running (maintaining its previous state) and emits the new value.

In [None]:
# how many numbers in list_of_numbers are divisible by 3?

list_of_numbers = demo_range(0, 1000)
divisible_by_three = len([n for n in list_of_numbers if n % 3 == 0])

print("divisible_by_three:", divisible_by_three)

## List vs generator comprehensions

List:

```[ <value> for <item> in <sequence> if <condition> ]```

Generator:

```( <value> for <item> in <sequence> if <condition> )```

In [None]:
generator = (1 for n in list_of_numbers if n % 3 == 0)
print("generator:", generator)
divisible_by_three = sum(generator)
print(divisible_by_three)

#Here, we have a generator that emits a value of 1 whenever it encounters a number divisible by 3, 
# and nothing otherwise.

In [None]:
# at this point, generator is completed

for e in generator:
    print(e)

sum(generator)

## Infinite Iterators
__itertools__ package comes with three iterators that can iterate infinitely.
- useful for generating numbers or cycling over iterables of unknown length
- infinite iterators need to be stopped

#### itertools.count(start,step)  
https://docs.python.org/3/library/itertools.html#itertools.count

In [None]:
from itertools import count

for i in count(10, 3):
    print(i)
    if i >= 30: break
        

#### itertools.islice(seq, [start,] stop [, step])   

https://docs.python.org/3/library/itertools.html#itertools.islice

Make an iterator that returns selected elements from the iterable.  


In [None]:
from itertools import islice

print(list(islice(count(10, 3), 5)))     # first 5 elements 
print(list(islice(count(10, 3), 10)))    # first 10 elements 
print(list(islice(count(10, 3), 5, 10))) # second 5 elements: from 5th to 9th element included

In [None]:
# itertools.islice(seq, [start,] stop [,step])

print(list(islice(count(10, 3), 10)))       # first 10 elements 
print(list(islice(count(10, 3), 3, 10, 2))) # from 3rd to 9th element (including those), every second element 

**None**: "If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position."

In [None]:
print(list(islice('ABCDEFG', 2, None)))
print(list(islice('ABCDEFG', 100))) # and len('ABCDEFG') = 7

#### itertools.cycle(seq)

https://docs.python.org/2/library/itertools.html#itertools.cycle

In [None]:
from itertools import cycle

print(list(islice(cycle('abc'), 12)))

In [None]:
# iterators can be used in different ways
lst = ['advanced','python','for','data','science']
print(list(islice(cycle(lst), 10)))

#### itertools.repeat(elem [,times])

https://docs.python.org/2/library/itertools.html#itertools.repeat

In [None]:
from itertools import repeat

# repeat an object: e.g. string

print(list(repeat('abcde', 5)))

In [None]:
# repeat an object: e.g. list
print(list(repeat(['CDS', 'Courant', 'NYU'], 5)))


## Finite Iterators
itertools also has a number of iterators that terminate.

#### itertools.accumulate(seq [, func])

In [None]:
from itertools import accumulate

print(list(accumulate(range(1, 11)))) # 1, 1 + 2, 1 + 2 + 3, 1 + 2 + 3 + 4,...

In [None]:
import operator

print(list(accumulate(range(1, 11), operator.mul))) # 1, 1 * 2, 1 * 2 * 3, 1 * 2 * 3 * 4

In [None]:
# it can also handle non-numeric lists such as strings

print(list(accumulate('cds_nyu')))

print(list(accumulate(repeat('python ', 3))))

More examples here: https://docs.python.org/3/library/itertools.html#itertools.accumulate

#### itertools.chain(*iterables)  

In [None]:
my_list = ['foo', 'bar']
cmd = ['ls', '/some/dir']
numbers = list(range(5))
my_list.extend(cmd)
my_list.extend(numbers)

print(my_list)

The chain iterator takes a series of iterables and flattens them down into one long iterable.

In [None]:
from itertools import chain

my_list = list(chain(['foo', 'bar'], cmd, numbers))

print(my_list)

#### itertools.compress(seq, selectors)
Useful for filtering an iterable using a second boolean iterable (i.e. an indicator)

https://docs.python.org/2/library/itertools.html#itertools.compress


In [None]:
from itertools import compress

letters = 'ABCDEFGHIJKJM'
bools = [True, True, False, True, False, True, True, True, False, True]
print(len(letters), len(bools))

# notice the sizes do not need to match
print(list(compress(letters, bools)))

In [None]:
def either_aeiou(e):
#     return (e == 'A') or (e == 'E') or (e == 'I') or (e == 'O') or (e == 'U')
    return e in 'AEIOU'

print(list(compress(letters, [either_aeiou(e) for e in letters])))

---

### itertools.dropwhile(predicate, seq)  

- Drop the elements while the predicate is True; afterwards, returns every element.

### itertools.takewhile(predicate, seq)  

- Take the elements while the predicate is True

In [None]:
from itertools import dropwhile, takewhile

print(list(dropwhile(lambda x: x > 5, [6, 7, 8, 9, 10, 1, 2, 3, 20])))

print(list(takewhile(lambda x: x > 5, [6, 7, 8, 9, 10, 1, 2, 3, 20])))

In [None]:
lst = ['parrot', 'pelican', 'lion', 'cat', 'panther', 'dolphin', 'dog']

print("lst_take_while:", list(takewhile(lambda word: word[0] == 'p', lst)))

print("lst_drop_while:", list(dropwhile(lambda word: word[0] == 'p', lst)))


#### itertools.filterfalse(predicate, seq) 

https://docs.python.org/2/library/itertools.html#itertools.ifilterfalse

Filter elements to keep only those for which the predicate is False (opposite behavior of `filter`).

In [None]:
from itertools import filterfalse

# filter *all* elements for which the predicate is false

print(list(filterfalse(lambda x: x < 5, [6, 7, 8, 9, 10, 1, 2, 3, 20])))

In [None]:
# contrast with the standard function filter
list(filter(lambda x: x < 5, [6, 7, 8, 9, 10, 1, 2, 3, 20]))

#### itertools.groupby(seq, key=None)  
https://docs.python.org/2/library/itertools.html#itertools.groupby  
Make an iterator that returns consecutive keys and groups from the iterable.  
`key` is a function computing a key value for each element.

In [None]:
from itertools import groupby
 
numbers = range(20)
# to group consecutive numbers from range(20) by the same **quotient** when divided by 5; 
# i.e. for x in range(20) group by x // 5

for (key, group) in groupby(numbers, lambda x: x // 5):
    print(key, list(group))

In [None]:
# group the list of elements by the same **remainder** 
for (key, group) in groupby(numbers, lambda x: x % 5):
    print(key, list(group))

In [None]:
# combining the groups with same key
import collections
groups = collections.defaultdict(list)

for (key, group) in groupby(numbers, lambda x: x % 5):
    groups[key] += group

for key, group in groups.items():
    print(key, group)

In [None]:
keyfn = lambda x: x % 5

for (key, group) in groupby(sorted(numbers, key=keyfn), keyfn):
    print(key, list(group))

#### itertools.starmap(function, seq)
https://docs.python.org/3/library/itertools.html#itertools.starmap

Iterator that computes the function using arguments obtained from the iterable. 

In [None]:
from itertools import starmap

# here is the iterable: [(1,2), (3,4), (5,6)]

for item in starmap(lambda u, v: u + v, [(1, 2), (3, 4), (5, 6)]):
    print(item)

In [None]:
def add(u, v, w):
    return u + v + w

for item in starmap(add, [(1, 2, 3), (3, 4, 5), (5, 6, 7)]):
    print(item)

#### itertools.tee(seq, n=2)
Creates n iterators from the given sequence

In [None]:
from itertools import tee
data = 'ABCDE'
iters = tee(data, 5)

for i in range(5):
    print(f'iterator:{i}')
    for item in iters[i]:
        print(item, end="")
    print("\n")

#### itertools.zip_longest(*seq, fillvalue=None)  
https://docs.python.org/2/library/itertools.html#itertools.izip_longest  
An iterator that aggregates elements from each of the iterables.  

In [None]:
# The standard zip stops after the shortest list is finished
list(zip('AB', 'xyzw', range(5)))

In [None]:
from itertools import zip_longest

print(list(zip_longest('AB', 'xyzw', fillvalue='_')))

In [None]:
print(list(zip_longest('AB', 'xyzw', range(5), fillvalue='_')))

In [None]:
# useful to create dictionaries

vals = ['pqr', 'uvw', 'xyz']

dc = dict(zip_longest('1234567', vals, fillvalue='blank_value'))
print("dict:", dc)

In [None]:
list(map(str, range(1, 8)))

## Combinatoric Generators
Iterators that can be used for creating combinations and permutations of data

#### itertools.combinations(seq, r) 
#### itertools.combinations_with_replacement(seq, r)
#### itertools.permutations(iterable, r=None)

---

E.g. There is an urn with four balls: yellow, green, red, and blue.  

Q: In how many ways can one pick two (three) balls without replacement?  

Q: In how many ways can one pick two (three) balls with replacement?

In [None]:
from itertools import combinations

for item in combinations('RBGY', 2):
    print(''.join(item), end="; ")

In [None]:
from itertools import combinations_with_replacement

for item in combinations_with_replacement('RBGY', 2):
    print(''.join(item), end="; ")

In [None]:
from itertools import permutations

for item in permutations('RBGY', 2):
    print(''.join(item), end="; ")

#### itertools.product(*seq, repeat=1)
Produces the **cartesian product** of sequences

In [None]:
from itertools import product

arrays = [('A', 'B'), ('a', 'b', 'c')]
cart_prod = list(product(*arrays))
print(cart_prod)

# size (cardinality) is 2 * 3  = 6


In [None]:
# Calling a function with all possible choices of parameters

choices = [(-1,1), (-3,3), (-5,5)]
cp = list(product(*choices))
print(cp)

def f(a, b, c):
    print('(', a, b, c, ')')
    
for args in cp:
    f(*args)

In [None]:
choices = {
    'a': [-1, 1],
    'b': [-3, 3],
    'c': [-5, 5],
}

def f(**kwargs):
    print(kwargs)

all_dicts = list(dict(zip(choices.keys(), vals)) for vals in product(*choices.values()))
for d in all_dicts:
    f(**d)

In [None]:
import numpy as np

M = np.zeros((2, 2))
cids = product((0, 1), (0,1))

print("cids:", cids)

for i, e  in enumerate(cids):
    M[e] = i * 100
    print("i = %s, e = %s" %(i, e))

print("\nM=\n", M)

In [None]:
import numpy as np

M = np.zeros((3, 3, 3))
cids = product((0,1,2), (0,1,2), (0,1,2))


print("cids: ", cids)

for i, e  in enumerate(cids):
    M[e] = i * 100
    print(f"i = {i:>2}, e = {e}")

print("\nM=\n", M)

In [None]:
# One-liner to do the same thing with numpy