In [1]:
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

## Built-in Iteration Helper Functions

## The range function

In Python 3.X, `range` is an iterable that generates items **on demand**, so we need to wrap it in a list call to display its results all at once:

In [2]:
# Not wrapped
range(5), range(2, 5), range(0, 10, 2)

(range(0, 5), range(2, 5), range(0, 10, 2))

In [3]:
# Wrapped
list(range(5)), list(range(2, 5)), list(range(0, 10, 2))

([0, 1, 2, 3, 4], [2, 3, 4], [0, 2, 4, 6, 8])

With one argument, `range` generates a list of integers from zero up to but not including the argument’s value. If passed two arguments, the first is taken as the lower bound. An optional third argument can give a `step`; if it is used, Python adds the `step` to each successive integer in the result (the step defaults to +1). Ranges can also be nonpositive and nonascending:

In [4]:
# With one argument
list(range(5))

[0, 1, 2, 3, 4]

In [5]:
# With step
list(range(0, 20, 5))

[0, 5, 10, 15]

In [6]:
# Negative
list(range(-10, 20, 7))

[-10, -3, 4, 11, 18]

In [7]:
# Reverse step (move from right to left)
list(range(30, 10, -3))

[30, 27, 24, 21, 18, 15, 12]

For loops force the results of `range` automatically in Python 3, and so there is no need to call `list()`:

In [8]:
for i in range(5):
    print(i)

0
1
2
3
4


### Sequence Shufflers: range and len

The `range` function, when used in iteration context, allows for stepping over a list of offsets into a sequence to be iterated over, not the actual items of this sequence (the integer index is used to slice the string rather than the variable `i`). For instance, the range’s integers provide a repeat count in the following reordering example:

In [9]:
S = "yang"
for i in range(len(S)):
    # On each count from 0 to 3, take a slice from the second character of S and another slice of its 0th (first character), then combine them
    # This effectively moves the first character of S to the last
    S = S[1:] + S[:1]
    print(S, end=" ")

angy ngya gyan yang 

Note that this operator changes the original variable S, and this can be seen if we stop the count before we step thru the full `len` of S:

In [10]:
for i in range(2):
    # On each count from 0 to 3, take a slice from the second character of S and another slice of its 0th (first character), then combine them
    # This effectively moves the first character of S to the last
    S = S[1:] + S[:1]
    print(S, end=" ")

angy ngya 

In [11]:
# Now S is a different string
S

'ngya'

This is because for loop statements never localize their variables to the statement block in any Python. Therefore, the name 'S' is bound to a new string object at each iteration (even though strings are immutable). The original string objects' reference counts would reach zero and they are garbage collected. Used another way, the range function can act as a position offset (now the variable `i` is used to slice the string):

In [12]:
X = "string"
for i in range(len(X)):
    X = X[i:] + X[:i]
    print(X, end=" ")

string trings ingstr string ngstri ingstr 

Notice that the name 'X' is again bound to a new object.

In [13]:
X

'ingstr'

### Nonexhaustive Traversals: range Versus Slices

The range function may be used in `for` loop contexts to skip elements:

In [14]:
import string

s = string.ascii_lowercase
# We visit every fourth element using the integer offset generated by range
for i in range(0, len(s), 4):
    print(s[i], end=" ")

a e i m q u y 

This can be achieved through slicing, which is better practice. The potential advantage to using range here instead is space: slicing makes a copy of the string in both 2.X and 3.X, while range in 3.X and xrange in 2.X do not create a list; for very large strings, they may save memory.

In [15]:
# We actually iterate through the iterable string s[::4]
for letter in s[::4]:
    print(letter, end=" ")

a e i m q u y 

### Changing Lists: range Versus Comprehensions

A simple `for` cannot change each element of a sequence, since the iteration protocal simply calls the `iter` method on the iterable, which returns the iterator, whose `next` method is called. The original sequence iterated over is not updated as in the following case:

In [16]:
l = [1, 2, 3, 4]
for num in l:
    num += 1

Even though `num` is updated (it refers to the element each call of `next` pulls out), the list is not updated:

In [17]:
num
l

5

[1, 2, 3, 4]

To update the list, we must use integer offset or indexes so we can assign an updated value to each position as we traverse the sequence. This loop iterates through list positions, not the actual items:

In [18]:
for i in range(len(l)):
    l[i] += 1
l

[2, 3, 4, 5]

With a `while` loop, this can be accomplished as follows:

In [19]:
i = 0
while i < len(l):
    l[i] += 1
    i += 1
l

[3, 4, 5, 6]

A list comprehension solution:

In [20]:
[i + 1 for i in l]

[4, 5, 6, 7]

Note that a list comprehension creates a new list rather than modifies the original list in place.

## Parallel Traversals: zip

The `zip` function allows us to use for loops to visit multiple sequences in parallel. In Python 3, `zip` is an iterable object where we must call list to display its results. The `zip` function takes one or more sequences as arguments and returns a series of tuples that pair up parallel items taken from those sequences.

In [21]:
import inspect

inspect.isclass(zip)

True

In [22]:
l1 = [1, 2, 3, 4, 5]
l2 = ["a", "b", "c", "d", "e"]
zip(l1, l2)
list(zip(l1, l2))

<zip at 0x10fad83c0>

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')]

In [23]:
for x, y in zip(l1, l2):
    print("The letter", y, "has position", x, "in the alphabet")

The letter a has position 1 in the alphabet
The letter b has position 2 in the alphabet
The letter c has position 3 in the alphabet
The letter d has position 4 in the alphabet
The letter e has position 5 in the alphabet


The built-in `zip` function truncates result tuples at the length of the shortest sequence when the argument lengths differ:

In [24]:
print(l2)
t3 = (1, 2, 3)
print(t3)
list(zip(t3, l2))

['a', 'b', 'c', 'd', 'e']
(1, 2, 3)


[(1, 'a'), (2, 'b'), (3, 'c')]

To achieve the opposite, use `itertools.zip_longest(*iterables, fillvalue=None)`. If the iterables are of uneven length, missing values are filled-in with `fillvalue`. Iteration continues until the longest iterable is exhausted:

In [25]:
from itertools import zip_longest

list(zip_longest(l1, l2, t3, fillvalue="missing"))

[(1, 'a', 1),
 (2, 'b', 2),
 (3, 'c', 3),
 (4, 'd', 'missing'),
 (5, 'e', 'missing')]

When we have a single iterable (for instance, a list, whose elements are also lists), we could also use zip with a single iterable with unpacking:

In [26]:
# Nested list with two list elements
l = [[1, 2, 3], [4, 5, 6]]
# Zip with unpacking
list(zip(*l))

[(1, 4), (2, 5), (3, 6)]

In [27]:
# Zip without unpacking
list(zip(l))

[([1, 2, 3],), ([4, 5, 6],)]

The $*$ in a function call "unpacks" a list (or other iterable), making each of its elements a separate argument in `zip`. Without the $*$ operator, we are runing `zip([[1, 2, 3], [4, 5, 6]])`, which means the `zip` function will view `l` as a single iterable, returning

* ([1, 2, 3], empty since there is no second iterable) and ([4, 5, 6], empty since there is no second iterable)

With the $*$, we are essentially running `zip([1, 2, 3], [4, 5, 6])`, which returns the pairing of the elements of the two iterables:

* (1, 4), (2, 5), (3, 6)

### Dictionary construction with zip

In [28]:
keys = ["name", "age", "major", "hometown", "height"]
values = ["yang", 23, "stats", "China", 177]

In [29]:
d = {}
# Using zip and for loop
for key, value in zip(keys, values):
    d[key] = value
d

{'name': 'yang',
 'age': 23,
 'major': 'stats',
 'hometown': 'China',
 'height': 177}

In [30]:
# Using the constructor directly
d1 = dict(zip(keys, values))
d1

{'name': 'yang',
 'age': 23,
 'major': 'stats',
 'hometown': 'China',
 'height': 177}

In [31]:
# Dictionary comprehension
{key: value for (key, value) in zip(keys, values)}

{'name': 'yang',
 'age': 23,
 'major': 'stats',
 'hometown': 'China',
 'height': 177}

## Generating Both Offsets and Items: enumerate

While `range` can be employed to generate the offsets of items in a sequence or an interable or an interator, rather than the items at those offsets, we may sometimes require both--- the items to use plus an offset. The built-in `enumerate` function provides a way to achieve this:

In [32]:
s = "yang"
for offset, item in enumerate(s):
    print(item, "appears at offset", offset)

y appears at offset 0
a appears at offset 1
n appears at offset 2
g appears at offset 3


The `enumerate` function returns an iterable or generator object, which is a kind of object that supports the iteration protocol. It has a method called by the `next` built-in function, which returns an `(index, value)` tuple at each iteration.

In [33]:
list(enumerate(s))

[(0, 'y'), (1, 'a'), (2, 'n'), (3, 'g')]

## Mapping Functions over Iterables: map

One of the more task to do with lists and other sequences is applying an operation to each item in those sequences and collect the results (like R's `lapply` and functions from the `purrr` package). The `map` function in Python applies a passed-in function to each item in an iterable object and returns a list containing all the function call results. The syntax of map is as follow:

* `map(func, *iterables) --> map object`

In [34]:
# A sequence
seq = [1, 2, 3, 4]


# User defined function
def increment(x):
    return x + 10


# Returns a map object
map_obj = map(increment, seq)

Because `map` is an iterable in Python 3.X, we need to use a list to force it to produce all its results for display.

In [35]:
from collections.abc import Iterable

isinstance(map_obj, Iterable)

True

In [36]:
list(map_obj)

[11, 12, 13, 14]

Similar to R's `lapply()`, we can use an anonymous function instead of a pre-defined function. This is a place where `lambda` commonly appears:

In [37]:
list(map((lambda x: x + 3), seq))

[4, 5, 6, 7]

The equivalent for loop is as follows:

In [38]:
updated = []
for x in seq:
    updated.append(x + 3)
updated

[4, 5, 6, 7]

The equivalent list comprehension is as follows:

In [39]:
[x + 3 for x in seq]

[4, 5, 6, 7]

### Multiple argument functions

Moreover, `map` can be used in more advanced ways. For multiple sequence arguments, `map` sends items taken from sequences in parallel as distinct arguments to the function:

In [40]:
pow(3, 4)
list(map(pow, [1, 2, 3], [2, 3, 4]))

81

[1, 8, 81]

In the example above, the two sequences are `zip`ed together to produce sequences of arguments like:

In [41]:
list(zip([1, 2, 3], [2, 3, 4]))

[(1, 2), (2, 3), (3, 4)]

The function call at each iteration is therefore:

* `pow(1, 2)`
* `pow(2, 3)`
* `pow(4, 4)`

With multiple sequences, `map` expects an N-argument function for N sequences. With multiple iterables (sequences), the iterator stops when the shortest iterable is exhausted. For using constant arguments like the `map` functions from R's `purrr` package, we could use the `repeat` function from `itertools`:

In [42]:
from itertools import repeat

repeat(4, 3)
list(repeat(4, 3))

repeat(4, 3)

[4, 4, 4]

In [43]:
from operator import add

# Not specifying times will produce an infinite iterator
# But map stops when the shortest iterable is exhausted as [1, 2, 3] is shorter than the infinite
list(map(add, [1, 2, 3], repeat(9)))

[10, 11, 12]

The equivalent for loop is as follows:

In [44]:
l = []
for x, y in zip([1, 2, 3], repeat(9)):
    l.append(x + y)
l

[10, 11, 12]

The equivalent list comprehension is:

In [45]:
[x + y for x, y in zip([1, 2, 3], repeat(9))]

[10, 11, 12]

## Selecting Items in Iterables: filter

The `filter` function selects an iterable’s items based on a test function (also called a predicate). The syntax is as follows:

* `filter(function or None, iterable) --> filter object`

In [46]:
list(range(-5, 5))
# Filter items in range(-5, 5) based on a predicate function returning True or False
list(filter((lambda x: x > 0), range(-5, 5)))

[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]

[1, 2, 3, 4]

The equivalent for loop is as follow:

In [47]:
l2 = []
for item in range(-5, 5):
    if item > 0:
        l2.append(item)
l2

[1, 2, 3, 4]

The equivalent list comprehension is as follows:

In [48]:
[x for x in range(-5, 5) if x > 0]

[1, 2, 3, 4]

## Combining Items in Iterables: reduce

The functional `reduce` call, which is a simple built-in function in 2.X but lives in the `functools` module in 3.X, accepts an iterable to process, but it’s not an iterable itself. Instead, `reduce` returns a *single* result.

* `reduce(function, sequence[, initial]) -> value`

The `reduce` function applies a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. If initial is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty.

In [49]:
from functools import reduce

# For reduce, the function argument is a binary function accepting two arguments
# Cumulative sum
reduce((lambda x, y: x + y), [1, 2, 3, 4])
# Cumulative product
reduce((lambda x, y: x * y), [1, 2, 3, 4])

10

24

To understand, the above functions are equivalent to the following:

* `reduce(lambda x, y: x + y, [1, 2, 3, 4])` calculates (((1 + 2) + 3) + 4)

* `reduce((lambda x, y: x * y), [1, 2, 3, 4])` calculates (((1 * 2) * 3) * 4)

In [50]:
# Implementing reduce
def myreduce(function, sequence):
    # Initialize value
    tally = sequence[0]
    # Proceed with the second element of the sequence
    for next in sequence[1:]:
        tally = function(tally, next)
    return tally

In [51]:
myreduce((lambda x, y: x + y), [1, 2, 3, 4, 5])
myreduce((lambda x, y: x * y), [1, 2, 3, 4, 5])

15

120

The standard library `operator` module provides functional (prefix) forms that correspond to built-in expressions, which may come in handy for some uses of functional tools:

In [52]:
import functools
import operator

functools.reduce(operator.add, [2, 4, 6])
functools.reduce((lambda x, y: x + y), [2, 4, 6])

12

12