## Itertools
This notebook is based on a [youtube video](https://www.youtube.com/watch?v=Qu3dThVy6KQ)

* itertools is a collection of tools that allows us to work with iterators in a fast and memory efficient way
* iterators are sequential data that we can iterate or loop over
* itertools module contains a number of commonly used iterators as well as functions to combine several iterators

### count
* if no arguments passed, will start at 0 and count out by 1 each iteration and go on forever
* we can pass arguments to define start value, with different step values
* used for 
  + for a list of values, and want to assign index to them, if we do not know how many items we have
* never run this, which will run forever, since counter will not stop:

In [70]:
import itertools
counter = itertools.count()
print("start and step at default")
print(next(counter))
print(next(counter))
print(next(counter))

print("---------------------------")
print("start and step both at 5")
counter = itertools.count(start=5, step=5)
print("start and step at default")
print(next(counter))
print(next(counter))
print(next(counter))

print("---------------------------")
print("start and step both at -2.5")
counter = itertools.count(start=5, step=-2.5)
print("start and step at default")
print(next(counter))
print(next(counter))
print(next(counter))

start and step at default
0
1
2
---------------------------
start and step both at 5
start and step at default
5
10
15
---------------------------
start and step both at -2.5
start and step at default
5
2.5
0.0


In [71]:
print(next(counter))
print(next(counter))
print(next(counter))

-2.5
-5.0
-7.5


#### Code example of count: assing index to a list
* use zip function, which combines two interables and pairs values together
  + get the 1st value of count(), which is 0, and pairs with 1st value of data, which is 100, and so on
  + zip function returns an iterable, which needs to loop over to get all the values
    - we can convert the iterable from zip function to a list
  + count function grabs each item in data list and can work with any size of data  

In [72]:
# code example 1, assign index
data = [100, 200, 300, 400]
daily_data = list(zip(itertools.count(), data))
daily_data

[(0, 100), (1, 200), (2, 300), (3, 400)]

### zip_ongest
* zip will stop at the end of the shortest input iterable, but zip_longest will stop at the longest input iterable
* the default values to pair for shorter iterable is None
* remember, the following code will never end, since count() will keep runnin forever:

In [73]:
# compare zip and zip_longest
data = [100, 200, 300, 400]
daily_data = list(zip(range(10), data))
print("using zip")
print(daily_data)
print("-------------------------------------")
daily_data = list(itertools.zip_longest(range(10), data))
print("using zip_longest with default argument")
print(daily_data)

using zip
[(0, 100), (1, 200), (2, 300), (3, 400)]
-------------------------------------
using zip_longest with default argument
[(0, 100), (1, 200), (2, 300), (3, 400), (4, None), (5, None), (6, None), (7, None), (8, None), (9, None)]


### cycle
* runs forever
* takes an iterable as an argument and cycles through those values over and over

In [74]:
import itertools
cycle = itertools.cycle([1, 2, 3])

print(next(cycle))
print(next(cycle))
print(next(cycle))
print(next(cycle))
print(next(cycle))
print(next(cycle))

1
2
3
1
2
3


### repeat
* take some value and repeat indefinitely

In [75]:
import itertools
repeat = itertools.repeat(2)
print("repeat with default times goes infinitely")
print(next(repeat))
print(next(repeat))
print(next(repeat))
print(next(repeat))
print(next(repeat))
print(next(repeat))
      
print("-------------------------------------")
repeat = itertools.repeat(2, times=3)
print("repeat with repeat times defined as 3")
print(next(repeat))
print(next(repeat))
print(next(repeat))
print(next(repeat))
print(next(repeat))
print(next(repeat))

repeat with default times goes infinitely
2
2
2
2
2
2
-------------------------------------
repeat with repeat times defined as 3
2
2
2


StopIteration: 

#### code example of repeat: apply parameters to map function
* in the example, elements in range(10) and itertools.repeat(2) are paired and passed to pow by map
* when the shorter iterable (range(10) is consumed, map function stops
* map returns an iterable that wait for next() method, we can convert it to a list
* repeat is usually used to pass a stream of constant values to a fucntion like map or zip

In [76]:
squares = map(pow, range(10), itertools.repeat(2))
list(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

### starmap
* similar to map 
* instead of taking arguments from iterables, it takes arguments from tuples with arguments paired together

In [77]:
squares = itertools.starmap(pow, [(0, 2), (1, 2), (2, 2)])
print(list(squares))

[0, 1, 4]


### functions to return combinations and permutations
* gives the permutation/combination by using each value once
* to generate results with repeated values from the input interable, use product
  + product gives all the permutations of the input iterable including repeat use of the elements
* to get combinations with repeated values, using combinations_with_replacement(iterable, number)
  + `combinations_with_replacement([0, 1, 2, 3], 4)`

In [78]:
# combinations
letters = ['a', 'b', 'c', 'd']
numbers = [0, 1, 2, 3]
names = ['Corey', 'Nicole']

result = itertools.combinations(letters, 2)
for item in result:
    print(item)

('a', 'b')
('a', 'c')
('a', 'd')
('b', 'c')
('b', 'd')
('c', 'd')


In [79]:
#permutations
result = itertools.permutations(letters, 2)
for item in result:
    print(item)

('a', 'b')
('a', 'c')
('a', 'd')
('b', 'a')
('b', 'c')
('b', 'd')
('c', 'a')
('c', 'b')
('c', 'd')
('d', 'a')
('d', 'b')
('d', 'c')


In [80]:
# products
result = itertools.product(numbers, repeat=4)
for item in result:
    print(item)

(0, 0, 0, 0)
(0, 0, 0, 1)
(0, 0, 0, 2)
(0, 0, 0, 3)
(0, 0, 1, 0)
(0, 0, 1, 1)
(0, 0, 1, 2)
(0, 0, 1, 3)
(0, 0, 2, 0)
(0, 0, 2, 1)
(0, 0, 2, 2)
(0, 0, 2, 3)
(0, 0, 3, 0)
(0, 0, 3, 1)
(0, 0, 3, 2)
(0, 0, 3, 3)
(0, 1, 0, 0)
(0, 1, 0, 1)
(0, 1, 0, 2)
(0, 1, 0, 3)
(0, 1, 1, 0)
(0, 1, 1, 1)
(0, 1, 1, 2)
(0, 1, 1, 3)
(0, 1, 2, 0)
(0, 1, 2, 1)
(0, 1, 2, 2)
(0, 1, 2, 3)
(0, 1, 3, 0)
(0, 1, 3, 1)
(0, 1, 3, 2)
(0, 1, 3, 3)
(0, 2, 0, 0)
(0, 2, 0, 1)
(0, 2, 0, 2)
(0, 2, 0, 3)
(0, 2, 1, 0)
(0, 2, 1, 1)
(0, 2, 1, 2)
(0, 2, 1, 3)
(0, 2, 2, 0)
(0, 2, 2, 1)
(0, 2, 2, 2)
(0, 2, 2, 3)
(0, 2, 3, 0)
(0, 2, 3, 1)
(0, 2, 3, 2)
(0, 2, 3, 3)
(0, 3, 0, 0)
(0, 3, 0, 1)
(0, 3, 0, 2)
(0, 3, 0, 3)
(0, 3, 1, 0)
(0, 3, 1, 1)
(0, 3, 1, 2)
(0, 3, 1, 3)
(0, 3, 2, 0)
(0, 3, 2, 1)
(0, 3, 2, 2)
(0, 3, 2, 3)
(0, 3, 3, 0)
(0, 3, 3, 1)
(0, 3, 3, 2)
(0, 3, 3, 3)
(1, 0, 0, 0)
(1, 0, 0, 1)
(1, 0, 0, 2)
(1, 0, 0, 3)
(1, 0, 1, 0)
(1, 0, 1, 1)
(1, 0, 1, 2)
(1, 0, 1, 3)
(1, 0, 2, 0)
(1, 0, 2, 1)
(1, 0, 2, 2)
(1, 0, 2, 3)
(1, 0, 3, 0)

In [81]:
# combinations_with_replacement
result = itertools.combinations_with_replacement(numbers, 4)
for item in result:
    print(item)

(0, 0, 0, 0)
(0, 0, 0, 1)
(0, 0, 0, 2)
(0, 0, 0, 3)
(0, 0, 1, 1)
(0, 0, 1, 2)
(0, 0, 1, 3)
(0, 0, 2, 2)
(0, 0, 2, 3)
(0, 0, 3, 3)
(0, 1, 1, 1)
(0, 1, 1, 2)
(0, 1, 1, 3)
(0, 1, 2, 2)
(0, 1, 2, 3)
(0, 1, 3, 3)
(0, 2, 2, 2)
(0, 2, 2, 3)
(0, 2, 3, 3)
(0, 3, 3, 3)
(1, 1, 1, 1)
(1, 1, 1, 2)
(1, 1, 1, 3)
(1, 1, 2, 2)
(1, 1, 2, 3)
(1, 1, 3, 3)
(1, 2, 2, 2)
(1, 2, 2, 3)
(1, 2, 3, 3)
(1, 3, 3, 3)
(2, 2, 2, 2)
(2, 2, 2, 3)
(2, 2, 3, 3)
(2, 3, 3, 3)
(3, 3, 3, 3)


### chain
* chain allows us to chain together iterables so that it will go through all items in each iterables
  + we can combine all the iterables as one and iterate it
    - this solution creates a big list in memory
      + if the list contains a lot of items, it would be inefficient to generate a new list
      + what if the iterables are generators and we can't loop over all at once

#### code example of chain
we have three iterables: letters, numbers and names. Instead of concatenate them together using

`combined = letters + numbers + names`, we use

`combined = itertools.chain(letters, numbers, names)`
This can be very efficient depending on your data

In [82]:
# example code for chain
letters = ['a', 'b', 'c', 'd']
numbers = [0, 1, 2, 3]
names = ['Corey', 'Nicole']

combined = itertools.chain(letters, numbers, names)

for item in combined:
    print(item)

a
b
c
d
0
1
2
3
Corey
Nicole


### islice
* allows us to get a slice of an iterator
  + slicing on an iterator
* three different arguments:
  + stopping point to go from beginning of an iterator until it hits the stopping point
    - to slice a range from 0-9 and stop at index of 5
      `result = itertools.islice(range(10), 5)` 
  + starting point (if there is only one argument, it is the stopping point)
    - to slice a range from 0-9, start and stop at index of 1 and 5
      `result = itertools.islice(range(10), 1, 5)` 
  + step (if there is only one argument, it is the stopping point)
    - to slice a range from 0-9, start and stop at index of 1 and 5 with step of 2
      `result = itertools.islice(range(10), 1, 5, 2)` 

In [83]:
# using stopping point (stop at the 5th (index) element)
result = itertools.islice(range(10), 5)
for item in result:
    print(item)

0
1
2
3
4


In [84]:
# using stopping point (stop at the 5th (index) element)
result = itertools.islice(range(10), 1, 5)
for item in result:
    print(item)

1
2
3
4


#### When islice is useful 
* when we have a iterator that is too large to put into memory by casting it to a list to get a slice
* a log file with thousands of lines but only want to grab the top few lines from header of the file
  + file itself is an iterator. each next() will return one line
  + this is useful if we are looping over tons of large files and only getting just few lines
    - this allows us to get these values without loading the entire contents of file into memory

In [85]:
# Example code of islice for reading a log file
with open('test.log', 'r') as f:
    # here 3 is the only argument as the stopping point
    header = itertools.islice(f, 3)
    
    for line in header:
        print(line, end="")

Date: 2018-11-08
Author: Corey
Description: This is a sample log file


## functions that allow us to select elements from an iterable
* ### compress 
  + you have data and selectors to filter down the data
    - a True/False list corresponds to my letter list
* different from filter function, where a True/False value is return by a function
  + True/False values in compress is passed as an iterable
* itertools also provide filter_false to return element evaluated as False (complement to filter)  

Code example using compress

In [86]:
# Code example for compress
# letter list
letters = ['a', 'b', 'c', 'd']
# another column of data that
selectors = [True, True, False, True ]

# obtain an iterable only containing elements with True at positions defined by selectors
result = itertools.compress(letters, selectors)
for item in result:
    print(item)

a
b
d


Code example with filter

In [87]:
numbers = [0, 1, 2, 3]

# define a filter function
def lt_2(n):
    if n < 2:
        return True
    return False

result = filter(lt_2, numbers)

for item in result:
    print(item)

0
1


Code example with filterfalse in itertools

In [88]:
import itertools

numbers = [0, 1, 2, 3]

# define a filter function
def lt_2(n):
    if n < 2:
        return True
    return False

result = itertools.filterfalse(lt_2, numbers)

for item in result:
    print(item)

2
3


### dropwhile
* drop values from an iterable until one of the values returns False
  + it drops the first few elements that evaluated as True
  + once there is a value evaluated as False, from that point, the filter dose not apply, the rest of iterable is returned 

Code example of dropwhile
* even though the last two elements (1 and 0) are evaluated as True by lt_2, they are still returned by dropwhile

In [89]:
import itertools

numbers = [0, 1, 2, 3, 2, 1, 0]

# define a filter function
def lt_2(n):
    if n < 2:
        return True
    return False

result = itertools.dropwhile(lt_2, numbers)

for item in result:
    print(item)

2
3
2
1
0


### takepwhile
* take values from an iterable until one of the values returns False
  + it takes the first few elements that evaluated as True
  + once there is a value evaluated as False, from that point, the filter dose not apply, the rest of iterable is ignored 

In [90]:
import itertools

numbers = [0, 1, 2, 3, 2, 1, 0]

# define a filter function
def lt_2(n):
    if n < 2:
        return True
    return False

result = itertools.takewhile(lt_2, numbers)

for item in result:
    print(item)

0
1


### accumulate
* takes an iterable and return accumulated sums it sees
  + keep using addition by default
  + you can use other functions
    - multiply

Code example of accumulate

In [91]:
import itertools

numbers = [0, 1, 2, 3, 2, 1, 0]

result = itertools.accumulate(numbers)

for item in result:
    print(item)

0
1
3
6
8
9
9


In [92]:
# to use other function rather than addition, import operator and apply the operator
# here we use mutiply
import itertools
import operator

numbers = [1, 2, 3, 2, 1, 0]

result = itertools.accumulate(numbers, operator.mul)

for item in result:
    print(item)

1
2
6
12
12
0


### groupby
* go through an iterable, and group values based on a certain key and returns a stream of tuples
  + the tuples consist of key and an iterable containing all element grouped by that key

Example code of groupby
* the following list (people) will be used in example code (https://github.com/CoreyMSchafer/code_snippets/blob/master/Python/Itertools/snippets.txt)
* the list consists of dictionary elements. Each dictionary contains name, city and state entries
* we can group all the elements from the same state by groupby
  + first, we need to define a function that tells groupby which key we are going to use
    - this function returns the key from an item of the iterable
    - in this code example, we define the function as get_state(person) where person refers to an item
  + we then use groupby(iterable, get_state) to group the elements
  + groupby will feed each element to get_state function from iterable
  + the returned result is a dictionary 
    - with key as the state for a group 
    - an iterable containing all elements of that group as value  

In [93]:
import itertools

def get_state(person):
    return person['state']

people = [
    {
        'name': 'John Doe',
        'city': 'Gotham',
        'state': 'NY'
    },
    {
        'name': 'Jane Doe',
        'city': 'Kings Landing',
        'state': 'NY'
    },
    {
        'name': 'Corey Schafer',
        'city': 'Boulder',
        'state': 'CO'
    },
    {
        'name': 'Al Einstein',
        'city': 'Denver',
        'state': 'CO'
    },
    {
        'name': 'John Henry',
        'city': 'Hinton',
        'state': 'WV'
    },
    {
        'name': 'Randy Moss',
        'city': 'Rand',
        'state': 'WV'
    },
    {
        'name': 'Nicole K',
        'city': 'Asheville',
        'state': 'NC'
    },
    {
        'name': 'Jim Doe',
        'city': 'Charlotte',
        'state': 'NC'
    },
    {
        'name': 'Jane Taylor',
        'city': 'Faketown',
        'state': 'NC'
    }
]

person_group = itertools.groupby(people, get_state)

for key, group in person_group:
    print(key)
    print(group)

NY
<itertools._grouper object at 0x7fc2942de7c0>
CO
<itertools._grouper object at 0x7fc2942d9460>
WV
<itertools._grouper object at 0x7fc2942d93a0>
NC
<itertools._grouper object at 0x7fc2942d9ac0>


In [94]:
person_group = itertools.groupby(people, get_state)

for key, group in person_group:
    print(key)
    for person in group:
        print(person)
    print()    

NY
{'name': 'John Doe', 'city': 'Gotham', 'state': 'NY'}
{'name': 'Jane Doe', 'city': 'Kings Landing', 'state': 'NY'}

CO
{'name': 'Corey Schafer', 'city': 'Boulder', 'state': 'CO'}
{'name': 'Al Einstein', 'city': 'Denver', 'state': 'CO'}

WV
{'name': 'John Henry', 'city': 'Hinton', 'state': 'WV'}
{'name': 'Randy Moss', 'city': 'Rand', 'state': 'WV'}

NC
{'name': 'Nicole K', 'city': 'Asheville', 'state': 'NC'}
{'name': 'Jim Doe', 'city': 'Charlotte', 'state': 'NC'}
{'name': 'Jane Taylor', 'city': 'Faketown', 'state': 'NC'}



### Tee to replicate a iterable

* copy an iterable to other copies
* should not use the original iterable any more, should just use the copies
  + otherwise, it would have unintended consequece of exhausting the items in the replicates

In [69]:
person_group = itertools.groupby(people, get_state)
copy1, copy2 = itertools.tee(person_group)