# UNCLASSIFIED

Transcribed from FOIA Doc ID: 6689693

https://archive.org/details/comp3321

(U) Iterators, generators, sorting, and duck typing in Python. 

# (U) Introduction: List Comprehensions Revisited 

(U) We begin by reviewing the fundamentals of lists and list comprehension. 

In [None]:
melist = [ i for i in range(1, 100, 2) ]
for i in melist: # how does the loop work?
    print (i)

(U) What happens when the list construction gets more complicated?

In [None]:
noprimes = [ j for i in range(2, 20) for j in range(i*2, 500, i) ]

primes = [ x for x in range(2, 500) if x not in noprimes ]

print(sorted(primes)) 

(U) Can we do this in one shot? Yes, but... 

In [None]:
# nesting madness ! 
primes = [ x for x in range(2, 500) if x not in [ j for i in range(2, 19) for j in range(i*2, 500, i) ] ] 

In [None]:
i = 3
primes = [2]
while i < 500:
    for num in primes:
        if i % num == 0:
            break
    else:
        primes.append(i)
    i += 2
print(primes)

# (U) Iterators 

(U) To create your own iterable objects, suitable for use in `for` loops and list comprehensions, all you need to do is implement the right special methods for the class. The `__iter__` method should return the iterable object itself (almost always `self`), and the `__next__` method defines the values of the iterator. 

(U) Let's do an example, sticking with the theme previously introduced, of an iterator that returns numbers in order, except for multiples of the arguments used at construction time. We'll make sure that it terminates eventually by raising the `StopIteration` exception whenever it gets to 200. (This is a great example of an exception in Python that is not uncommon: handling an event that is not unexpected, but requires termination; `for` loops and list comprehensions expect to get the `StopIteration` exception as a signal to stop processing.) 

In [None]:
class NonFactorIterable(object):
    def __init__(self, *args):             # *args will be a list of all arguments passed to the object constructor
        self.avoid_multiples = args        # Assign args to self.avoid_multiples
        self.x = 0                         # Initialize self.x as 0
        
    def __next__(self):                    # The next method determines how new values are produced by the iterator
        self.x += 1                        # Increment self.x by 1
        while True:
            if self.x > 200:
                raise StopIteration        # StopIteration if x grows beyond 200
            for y in self.avoid_multiples: # Loop through the list passed as *args
                if self.x % y == 0:        # If x is divisible by any of the *args we will break the for loop
                    self.x += 1
                    break
            else:                          # This doesn't happen if we broke so the while loop restarts
                return self.x              # Otherwise, self.x is not a multiple of any of *args and is returned

    def __iter__(self):                    # This method tells Python that this is an iterable object.
        return self 

In [None]:
silent_fizz_buzz = NonFactorIterable(3, 5) # 

In [None]:
[x for x in silent_fizz_buzz] 

In [None]:
mostly_prime = NonFactorIterable(2, 3, 5, 7, 11, 13, 17, 19) 

In [None]:
partial_sum = 0

In [None]:
for x in mostly_prime: 
    partial_sum += x 

In [None]:
partial_sum 

In [None]:
mostly_prime = NonFactorIterable(2, 3, 5, 7, 11, 13, 17, 19) 
print(sum(mostly_prime)) 

(U) It may seem strange that the `__iter__` method doesn't appear to do anything. This is because in some cases the **iterator** for an object should not be the same as the object itself. Covering such usage is beyond the scope of the course. 

(U) There is another way of implementing a custom iterator: the `__getitem__` method. This allows you to use the square bracket `[]` notation for getting data out of the object. However, you still must remember to raise a `StopIteration` exception for it to work properly in for loops and list comprehensions. 

## (U) Another iterator example 

(U) In the below example, we create an iterator that returns the squares of numbers. Note that in the `__next__` method, all we're doing is iterating our counter (`self.x`) and returning the square of that counter number, as long as the counter is not greater than the pre-defined limit (`self.limit`). The `while` loop in the previous example was specific to that use-case; we don't actually need to implement any looping at all in `__next__`, as that's simply the method called for each iteration through a loop on our iterator. 

Here we're also implementing the `__getitem__` method, which allows us to retrieve a value from the iterator at a certain index location. This one simply calls the iterator using `self.__next__` until it arrives at the desired index location, then returns that value. 

In [None]:
class Squares(object):
    
    def __init__(self, limit=200):
        self.limit = limit
        self.x = 0

    def __next__(self):
        self.x += 1
        if self.x > self.limit:
            raise StopIteration
        return (self.x - 1) ** 2

    def __getitem__(self, idx):
        # reset counter to 0
        self.x = 0
        if not isinstance(idx, int):
            raise Exception("Only integer index arguments are accepted!")
        while self.x < idx:
            self.__next__()
        return self.x**2

    def __iter__(self):
        return self

In [None]:
my_squares = Squares(limit=20)

In [None]:
[x for x in my_squares]

In [None]:
my_squares[5]

In [None]:
# since we set a Limit of 20, we can't access an index Location higher than that 
my_squares[25] 

## (U) Benefits of Custom Iterators 

1. (U) Cleaner code 
2. (U) Ability to work with infinite sequences 
3. (U) Ability to use built-in functions like `sum` that work with iterables 
4. (U) Possibility of saving memory (e.g. `range`)

# (U) Generators 

(U) Generators are iterators with a much lighter syntax. Very simple generators look just like list comprehensions, except they're surrounded with parentheses `()` instead of square brackets `[]`. More complicated generators are defined like functions, with the one difference being that they use the `yield` keyword instead of the `return` keyword. A generator maintains state in between times when it is called; execution resumes starting 
immediately after the `yield` statement and continues until the next `yield` is encountered.

In [None]:
y = (x*x for x in range(30))
print (y) # hmm ...

In [None]:
def xsquared():
    for i in range(30):
        yield i*i

In [None]:
def xsquared_inf():
    x = 0
    while True:
        yield x*x
        x += 1

In [None]:
squares = [x for x in xsquared()]
print(squares)

(U) Another example...days of the week! 

In [None]:
def day_of_week():
    i = 0
    days = ["Monday" , "Tuesday" , "Wednesday" , "Thursday" , "Friday" , "Saturday" , "Sunday"]
    while True:
        yield days[i%7]
        i += 1
day_of_week()

In [None]:
import random
def snowday(prob=.01):
    r = random.random()
    if r < prob:
        return "snowday!"
    else:
        return "regular day."

In [None]:
n = 0
for x in day_of_week():
    today = snowday()
    print(x + " is a " + today)
    n += 1
    if today == "snowday!":
        break

In [None]:
weekday = (day for day in day_of_week())

In [None]:
next(weekday)

## (U) Pipelining Generators

(U) One powerful use of generators is to connect them together into a _pipeline_, where each generator is used by the next. Since Python evaluates generators "lazily," i.e. as needed, this can increase the speed and potentially allow steps to run concurrently. This is especially useful if one or two steps can take a long time (e.g. a database query). Without generators, the long-running steps will become a bottleneck for execution, but generators allow other steps to proceed while waiting for the long-running steps to finish. 

In [None]:
import random

# Get the fractional, part of a string representation of a float 
def frac_part(v):
    v = str(v)
    i, f = v.split( '.' )
    return f

In [None]:
# traditional approach 
results = [] 
for i in range(20): 
    r = random.random() * 100     # generate a random number 
    r_str = str(r)                # convert it to a string 
    r_frac = frac_part(r_str)     # get the fractional part 
    r_out = float('0.' + r_frac)  # convert it back to a float 
    results.append(r_out)

results

In [None]:
# generator pipeline 
rand_gen = ( random.random() * 100 for i in range(20) ) 
str_gen = ( str(r) for r in rand_gen ) 
frac_gen = ( frac_part(r) for r in str_gen ) 
out_gen = ( float( '0.' + r) for r in frac_gen )

results = list(out_gen)
results 

# (U) Sorting 

(U) In Python 3, iterable objects must have the `__lt__` ( `lt` = less than) method explicitly defined in order to be sortable.

(U) The built-in function `sorted(x)` returns a new list with the data from `x` in sorted order. The `sort` method (for `list`s only) sorts a list in-place and returns `None`. 

In [None]:
int_data = [10, 1 , 5 , 4, 2]

In [None]:
sorted(int_data)

In [None]:
int_data

In [None]:
int_data.sort()

In [None]:
int_data

(U) To specify how the sorting takes place, both `sorted` and `sort` take an optional argument called `key`. `key` specifies a _function_ of one argument that is used to extract a comparison key from each list element (e.g. `key=str.lower`). The default value is `None` (compare the elements directly). 

In [None]:
users = ['hAcker1', 'TheBoss', 'botman', 'turingTest' ] 

In [None]:
sorted(users) 

In [None]:
sorted(users, key=str.lower) 

(U)The `__lt__` function takes two arguments: `self` and `other` which is another object, normally of the same type. 

In [None]:
class comparableCmp(complex): 
    def __lt__ (self, other): 
        return abs(self) < abs(other)

In [None]:
a = 3+4j

In [None]:
b = 5+12j

In [None]:
a < b

In [None]:
a1 = comparableCmp(a) 

In [None]:
b1 = comparableCmp(b) 

In [None]:
a1 < b1 

In [None]:
c = [b1, a1]

In [None]:
sorted(c) 

(U) Here's how it works: 

1. the argument given to `key` must be a function that takes a single argument; 
2. internally, `sorted` creates function calls `key(item)` on each item in the list and then 
3. sorts the original list by using `__lt__` on the results of the `key(item)` function. 

(U) Another way to do the comparison is to use `key`: 

In [None]:
def magnitude_key(a) : 
    return (a*a.conjugate()).real 

In [None]:
magnitude_key(3+4j)

In [None]:
sorted([5+3j, 1j, -2j, 35+0j], key=magnitude_key)

(U) In many cases, we must sort a list of dictionaries, lists, or even objects. We could define our own key function or even several key functions for different sorting methods: 

In [None]:
list_to_sort = [
    {'lname': 'Dones', 'fname': 'Sally'},
    {'lname': 'Dones', 'fname': 'Derry'},
    {'lname': 'Smith', 'fname': 'Dohn'},
    {'lname': 'Phish', 'fname': 'James'},
]

In [None]:
def lname_sorter(list_item): 
    return list_item['lname'] 

In [None]:
def fname_sorter(list_item):
    return list_item['fname'] 

In [None]:
def lname_then_fname_sorter(list_item):
    return (list_item['lname' ], list_item['fname']) 

In [None]:
sorted(list_to_sort, key=lname_sorter)

In [None]:
sorted(list_to_sort, key=fname_sorter)

In [None]:
sorted(list_to_sort, key=lname_then_fname_sorter)

(U) While it's good to know how this works, this pattern common enough that there is a method in the standard library `operator` package to do it even more concisely.

In [None]:
import operator

In [None]:
lname_sorter = operator.itemgetter('lname') # same as previous lname_sorter 

(U) The application of the `itemgetter` method returns a _function_ that is equivalent to the `lname_sorter` function above. Even better, when passed multiple arguments, it returns a tuple containing those items in the given order. Moreover, we don't even need to give it a name first, it's fine to do this: 

In [None]:
sorted(list_to_sort, key=operator.itemgetter('lname'))

In [None]:
sorted(list_to_sort, key=operator.itemgetter('lname', 'fname')) # same as using lname_then_fname_sorter 

(U) To use `operator.itemgetter` with `list`s or `tuple`s, give it integer indices as arguments. The equivalent function for objects is `operator.attrgetter`. 

(U) Since we know so much about Python now, it's not hard to figure out how simple `operator.itemgetter` actually is; the following function is essentially equivalent: 

In [None]:
def itemgetter_clone(*args) : 
    def f(item): 
        return tuple(item[x] for x in args) 
    return f

(U) Obviously, `operator.itemgetter` and `itemgetter_clone` are not actually simple--it's just that most of the complexity is hidden inside the Python internals and arises out of the fundamental data model. 

# (U) Duck Typing 

(U) All the magic methods we've discussed are examples of the fundamental Python principle of **duck typing**: "If it walks like a duck and quacks like a duck, it must be a duck." Even though Python has `isinstance` and `type` methods, it's considered poor form to use them to validate input inside a function or method. If verification needs to take place, it should be restricted to verifying required behavior using `hasattr`. The benefit of this approach can be seen in the built-in `sum` function.

In [None]:
help(sum) 

(U) **Any** sequence of numbers, regardless of whether it's a `list`, `tuple`, `set`, generator, or custom iterable, can be passed to `sum`. 

(U) The following is a comparison of _bad_ and _good_ examples of how to write a `product` function:

In [None]:
def list_prod(to_multiply): 
    if isinstance(to_multiply, list): # don't do this! 
        accumulator = 1 
        for i in to_multiply: 
            accumulator *= i 
        return accumulator 
    else: 
        raise TypeError("Argument to_multiply must be a list")

Why does it have to be a `list`? This function would work with `tuple`s if the function didn't require a `list` by raising an exception if the argument is not an instance of `list`.

In [None]:
def generic_prod(to_multiply): 
    if hasattr(to_multiply, '__iter__') or hasattr(to_multiply, '__getitem__'): 
        accumulator = 1 
        for i in to_multiply: 
            accumulator *= i 
        return accumulator 
    else: 
        raise TypeError("Argument to_multiply must be a sequence")

In [None]:
list_prod([1,2,3])

In [None]:
list_prod((1,2,3))

In [None]:
generic_prod((1,2,3))

(U) Having given that example, testing for iterability is one of a few special cases where `isinstance` might be the right function to use, but not in the obvious way. The `collections` package provides **abstract base classes** which have the express purpose of helping to determine when an object implements a common interface. 

(U) Finally, effective use of duck typing goes hand in hand with robust error handling, based on the principle that "it's easier to ask for forgiveness than permission." 

# (U) Exercises 

1. Add a method to your `RangedQuery` class to allow instances of the class to be sorted by `start_date`.

2. Write an iterator class `ReverseIter` that takes a list and iterates it from the reverse direction. 

3. Write a generator which will iterate over every day in a year. For example, the first output would be `Monday, January 1`.

4. Modify the generator from exercise 2 so the user can specify the year and initial day of the week. 

# (U) Pipelining with Generators: Supplement to Lesson 10

(U) Defining processing pipelines with generators in Python. It's simply awesome. 

Note: This supplement was not portion marked. The only redaction was the author's name.

## Pipelining with Generators 

Imagine you're doing your laundry. Think about the stages involved. Roughly speaking, the stages are sorting, washing, drying, and folding. The beauty though is that even though these stages are sequential, they can be performed in parallel. This is called **pipelining**. 

Python generators make pipelining easy and can even clarify your code quite a bit. By breaking your processing into distinct stages, the Python interpreter can make better use of your computer's resources, and even break the stages out into separate threads behind the scenes. Memory is also conserved because values are automatically generated as needed, and discarded as soon as possible. 

A prime example of this is processing results from a database query. Often, before we can use the results of a database query, we need to clean them up by running them through a series of changes or transformations. Pipelined generators are perfect for this. 

In [None]:
from pprint import pprint 
import random 

## A Silly Example 

Here we're going to take 200 randomly generated numbers and extract their fractional parts (the part after the decimal point). There are probably more efficient ways to do this, but we're doing to do it by splitting out the string into two parts. Here we have a function that simply returns the integer part and the fractional part of an input float as two strings in a tuple.

In [None]:
def split_float(v):
    '''
    Takes a float or string of a float and returns a tuple containing the
    integer part and the fractional part of the number, as strings, respectively.
    '''
    v = str(v)
    i, f = v.split('.')
    return (i, '0.' + f)

## The Pipeline 

Here we have a pipeline of four generators, each feeding the one below it. We `pprint` out the final resulting list after all the stages have complete. See the comments after each line for further explanation. 

In [None]:
rand_gen = (random.random() * 100 for i in range(200)) # generate 200 random floats between 0 and 100, one at a time 
results = (split_float(r) for r in rand_gen) # call our split_float() function which will generate matching tuples 
results = (r[1] for r in results ) # we only care about the fractional part, so only keep that part of the tuple 
results = (float(r) for r in results) # convert our fractional value from a string back into a float 

pprint(list(results)) # print the final results 

## Why not a for-loop? 

We could have put all the steps of our pipeline into a single for-loop, but we get a couple advantages by breaking the stages out into separate generators: 

- There's some clarity gained by having distinct stages specified as a pipeline. People reading the code can clearly see the transforms. 
- In a for-loop, Python simply computes the values sequentially; there's no chance for automatic optimization or multi-threading. By breaking the stages out, each stage can execute in parallel, just like your washer and dryer. 

## Another (Pseudo-)Example 

Here's a pseudo-example querying a database that returns JSON that we need to convert to lists. 

```python
import json 
results = ( json.loads(result) for result in db_cursor.execute(my_query) ) 
results = ( r['results'] for r in results ) 
results = ( [ r['name'], retype'], r['count'], r['source'] ] for r in results ) 
```

## Filters 

We can even filter our data in our generator pipeline. 

```python
results = ( r for r in results if r[2] > 0 ) # remove results with a count of zero 
foo(results) # do something else with your results 
```

# UNCLASSIFIED

Transcribed from FOIA Doc ID: 6689693

https://archive.org/details/comp3321