# UNCLASSIFIED

Transcribed from FOIA Doc ID: 6689695

https://archive.org/details/comp3321

(U) Any programming language has to strike a balance between the number of basic elements it exposes, like control structures, data types, and so forth, and the utility of each one. For example, Python could do without `tuple`s entirely, and could replace the `dict` with a `list` of `list`s or even a single `list` where even-numbered indices contain _keys_ and odd-numbered indices contain _values_. Often, there are situations that happen so commonly that they warrant inclusion, but inclusion in the **builtin** library is not quite justified. Such is the case with the **collections** and **itertools** modules. Many programs could be simplified with a `defaultdict`, and having one available with a single `from collection import defaultdict` is much better than reinventing the wheel every time it's needed. 

# (U) Value Added Containers with **collections**

## `defaultdict`
(U) Suppose we want to build an index for a poem, so that we can look up the lines where each word occurs. To do this, we plan to construct a dictionary with the words as keys, and a list of line numbers as the value. Using a regular `dict`, we'd probably do something like this: 

In [None]:
poem = """mary had a little lamb 
its fleece was white as snow 
and everywhere that mary went 
the lamb was sure to go"""

In [None]:
index = {} 
for linenum, line in enumerate(poem.split('\n')): 
    for word in line.split(): 
        if word in index: 
            index[word].append(linenum) 
        else: 
            index[word] = [linenum]
print(index)

(U) This code would be simpler without the inner `if ... else ...` clause. That's exactly what a `defaultdict` is for; it takes a function (often a `type`, which is called as a constructor without arguments) as its first argument, and calls that function to create a _default_ value whenever the program tries to access a key that isn't currently in the dictionary. (It does this by overriding the `__missing__` method of `dict`.) In action, it looks like this: 

In [None]:
from collections import defaultdict
index = defaultdict(list)

for linenum, line in enumerate(poem.split('\n')):
    for word in line.split():
        index[word].append(linenum)
print(index)

(U) Although a `defaultdict` is almost exactly like a dictionary, there are some possible complications because it is possible to add keys to the dictionary unintentionally, such as when testing for membership. These complications can be mitigated with the `get` method and the `in` operator. 

In [None]:
'sheep' in index # False

In [None]:
1 in index.get('sheep') # Error 

In [None]:
'sheep' in index # still False 

In [None]:
2 in index['sheep'] # still False , but ... 

In [None]:
'sheep' in index # previous statement accidentally added 'sheep' 

(U) You can do crazy things like change the `default_factory` (it's just an attribute of the `defaultdict` object), but it's not commonly done:

In [None]:
import itertools

In [None]:
def constant_factory(value):
    return itertools.repeat(value).__next__

In [None]:
d = defaultdict(constant_factory('<missing>'))

In [None]:
d.update(name='John', action='ran')

In [None]:
'{0[name]} {0[action]} to {0[object]}'.format(d)

In [None]:
d # "object" added to d

## `Counter`

(U) A `Counter` is like a `defaultdict(int)` with additional features. If given a `list` or other iterable when constructed, it will create counts of all the unique elements it sees. It can also be constructed from a dictionary with numeric values. It has a custom implementation of `update` and some specialized methods, like `most_common` and `subtract`.

In [None]:
from collections import Counter

In [None]:
word_counts = Counter(poem.split())

In [None]:
word_counts.most_common(3)

In [None]:
word_counts.update('lamb lamb lamb stew'.split())

In [None]:
word_counts.most_common(3)

In [None]:
c = Counter(a=3, b=1)

In [None]:
d = Counter(a=1, b=2)

In [None]:
c + d

In [None]:
c - d # Did you get the output you expected?

In [None]:
(c - d) + d

In [None]:
c & d

In [None]:
c | d

## `OrderedDict`

(U) An `OrderedDict` is a dictionary that remembers the order in which keys were originally inserted, which determines the order for its iteration. Aside from that, it has a `popitem` method that can pop from either the beginning or end of the ordering. 

## `namedtuple`
(U) `namedtuple` is used to create lightweight objects that are somewhat like tuples, in that they are immutable and attributes can be accessed with `[]` notation. As the name indicates, attributes are named, and can also be accessed with the `.` notation. It is most often used as an optimization, when speed or memory requirements dictate that a `dict` or custom object isn't good enough. Construction of a `namedtuple` is somewhat indirect, as `namedtuple` takes field specifications as strings and returns a `type`, which is then used to create the named tuples, named 
tuples can also enhance code readability. 

In [None]:
from collections import namedtuple

In [None]:
Person = namedtuple('Person', 'name age gender')

In [None]:
bob = Person(name='Bob', age=30, gender='male')

In [None]:
print('%s is a %d year-old %s' % bob) # 2.x styLe string formatting

In [None]:
print('{} is a {} year-old {}'.format(*bob))

In [None]:
print('%s is a %d year-old %s' % (bob.name, bob.age, bob.gender))

In [None]:
print('{} is a {} year-old {}'.format(bob.name, bob.age, bob.gender))

In [None]:
bob[0]

In [None]:
bob['name'] # TypeError

In [None]:
bob.name

In [None]:
print('%(name)s is a %(age)d year-old %(gender)s' % bob ) # Doesn't work

In [None]:
print('{name} is a {age} year-old {gender}'.format(*bob)) # Doesn't work

In [None]:
print('{0.name} is a {0.age} year-old {0.gender}'.format(bob)) # Marks!

## `deque`

(U) Finally, `deque` provides queue operations.

In [None]:
from collections import deque

In [None]:
d = deque('ghi') # make a new deque with three items
print(d)

In [None]:
d.append('j') # add a new entry to the right side
print(d)

In [None]:
d.appendleft('f') # add a new entry to the Left side
print(d)

In [None]:
print(d.popleft()) # return and remove the Leftmost item
print(d)

In [None]:
d.rotate(1) # right rotation
print(d)

In [None]:
d.extendleft('abc') # extendLeft() reverses the input order
print(d)

(U) The **collections** module also provides Abstract Base classes for common Python interfaces. Their purpose and use is currently beyond the scope of this course, but the documentation is reasonably good. 

# (U) Slicing and Dicing with itertools 

Given one or more `list`s, `iterator`s, or other iterable objects, there are many ways to slice and dice the constituent elements. The **itertools** module tries to expose building block methods to make this easy, but also tries to make sure that its methods are useful in a variety of situations, so the documentation contains a [cookbook of common use cases](https://docs.python.org/3/library/itertools.html#itertools-recipes). We only have time to cover a small subset of the **itertools** functionality. Methods from **itertools** usually return an iterator, which is great for use in loops and list comprehensions, but not so good for inspection; in the code blocks that follow, we often call `list` on these things to unwrap them. 

(U)The `chain` method combines iterables into one super-iterable. The `groupby` method separates one iterator into groups of adjacent objects, possibly as determined by an optional argument -- this can be tricky, especially because there's no look back to see if a new key has been encountered previously. 

In [None]:
import itertools

In [None]:
list(itertools.chain(range(5),[5,6])) == [0,1,2,3,4,5,6]

In [None]:
size_groups = itertools.groupby([1,1,2,2,2,'p','p',3,4,3,3,2])

In [None]:
[(key, list(vals)) for key, vals in size_groups]

(U) A deeply nested for loop or list comprehension might be better served by some of the _combinatoric generators_ like `product`, `permutations`, or `combinations`.

In [None]:
iter_product = itertools.product([1,2,3],['a','b','c'])

In [None]:
list(iter_product)

In [None]:
iter_combi = itertools.combinations("abcd",3)

In [None]:
list_combi = list(iter_combi)
list_combi

In [None]:
iter_permutations = itertools.permutations("abcd",3)

In [None]:
list(iter_permutations)

(U) `itertools` can also be used to create generators:

In [None]:
counter = itertools.count(0, 5)

In [None]:
next(counter)

In [None]:
print(list(next(counter) for c in range(6)))

(U) Be careful... What's going on here?!? 

In [None]:
counter = itertools.count(0.2,0.1)
for c in counter:
    print (c)
    if c > 1.5:
        break

In [None]:
cycle = itertools.cycle('ABCDE')

In [None]:
for i in range (10):
    print(next(cycle))

In [None]:
repeat = itertools.repeat('again!')

In [None]:
for i in range(5):
    print(next(repeat))

In [None]:
repeat = itertools.repeat('again!', 3)
for i in range(5):
    print(next(repeat))

In [None]:
nums = range(10,0,-1)
my_zip = zip(nums, itertools.repeat('p'))
for thing in my_zip:
    print(thing)

# UNCLASSIFIED

Transcribed from FOIA Doc ID: 6689695

https://archive.org/details/comp3321