## `collections` Module


Specialized container type objects which provide alternatives to Python's built-in **dict**, and **times** objects.

---


In [1]:
# Import the most common 'collections' types
from collections import defaultdict, namedtuple, Counter, deque
import csv
import random
from urllib.request import urlretrieve

### `namedtuple`

- A convenient way to define a class without methods.
- Allows you to store `dict` -like objects which are accessible by attributes.
- Conventional `tuple` objects use indices to access data which aren't significant to the data elements themselves.

In [2]:
# Conventional tuple to define a user and a role
user = ('Tim', 'admin')

print(user)

('Tim', 'admin')


In [3]:
# tuple indices have no significant meaning to the data in each index
print(f'User {user[0]} has the role of {user[1]}.')

User Tim has the role of admin.


In [4]:
# Create a namedtuple container object for comparision
# First argument is the 'typename'
# Second argument is the names of the fields, space separated
User = namedtuple('User', 'name role')

In [5]:
# Create a data set from the named tuple
user = User(name='Tim', role='admin')

In [6]:
# Access data from the namedtuple with meaningful references
print(f'User {user.name} has the role of {user.role}.')

User Tim has the role of admin.


#### `namedtuple` documentation


In [7]:
help(namedtuple)

Help on function namedtuple in module collections:

namedtuple(typename, field_names, *, rename=False, defaults=None, module=None)
    Returns a new subclass of tuple with named fields.
    
    >>> Point = namedtuple('Point', ['x', 'y'])
    >>> Point.__doc__                   # docstring for the new class
    'Point(x, y)'
    >>> p = Point(11, y=22)             # instantiate with positional args or keywords
    >>> p[0] + p[1]                     # indexable like a plain tuple
    33
    >>> x, y = p                        # unpack like a regular tuple
    >>> x, y
    (11, 22)
    >>> p.x + p.y                       # fields also accessible by name
    33
    >>> d = p._asdict()                 # convert to a dictionary
    >>> d['x']
    11
    >>> Point(**d)                      # convert from a dictionary
    Point(x=11, y=22)
    >>> p._replace(x=100)               # _replace() is like str.replace() but targets named fields
    Point(x=100, y=22)



---
### `defaultdict`

- Useful to avoid `KeyError` exceptions when building a nested data set.
- In this example, players have multiple game scores in the data set.
- The goal is to have a single dictionary **key** for each player, within a `iterable` (a `list` in this case), with the **value** for each **key** being a `list` of scores.


In [8]:
# List of tuples with names and ages
game_scores = [
    ('Tim', 100),
    ('Sara', 150),
    ('Lily', 130),
    ('Ella', 180),
    ('Tim', 50),
    ('Sara', 60),
    ('Lily', 100),
    ('Ella', 70)
]

print(game_scores)

[('Tim', 100), ('Sara', 150), ('Lily', 130), ('Ella', 180), ('Tim', 50), ('Sara', 60), ('Lily', 100), ('Ella', 70)]


In [9]:
# Add family members to a new dictionary
scores = {}

print(scores)

{}


In [10]:
# Loop over the list and expand the tuples with multiple variable assignment
# Produces a value error because the keys for the player names do not yet exist
for name, score in game_scores:
    scores[name].append(score)

KeyError: 'Tim'

In [11]:
# Create a defaultdict and set the data type to produce when a key is not present (a list in this case)
scores = defaultdict(list)

print(scores)

defaultdict(<class 'list'>, {})


In [12]:
# Loop over the data set (game_scores)
# Create a key for each name, if it doesn't already exist
# Append a score to the value for any matching name key
for name, score in game_scores:
    scores[name].append(score)

print(scores)

defaultdict(<class 'list'>, {'Tim': [100, 50], 'Sara': [150, 60], 'Lily': [130, 100], 'Ella': [180, 70]})


In [13]:
for s in scores:
    print(s)


Tim
Sara
Lily
Ella


#### `defaultdict` documentation

In [14]:
help(defaultdict)

Help on class defaultdict in module collections:

class defaultdict(builtins.dict)
 |  defaultdict(default_factory[, ...]) --> dict with default factory
 |  
 |  The default factory is called without arguments to produce
 |  a new value when a key is not present, in __getitem__ only.
 |  A defaultdict compares equal to a dict with the same items.
 |  All remaining arguments are treated the same as if they were
 |  passed to the dict constructor, including keyword arguments.
 |  
 |  Method resolution order:
 |      defaultdict
 |      builtins.dict
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __copy__(...)
 |      D.copy() -> a shallow copy of D.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __missing__(...)
 |      __missing__(key) # Called by __getitem__ for missing key; pseudo-code:
 |      if self.default_facto

---
### `Counter`

- Easily count the most common words within a `string`.
- Conventionally, this process requires:
  - Creating a `dictionary` with a **key** representing each word in the `string`.
  - Looping over the words in the `string` and incrementing a counter **value** corresponding to the current word's **key**.
  - A loop and `lambda` function to sort the words, from most to least common.
  - A slice to display the top N results using the **`Counter(iterable).most_common(N)`** method.

In [15]:
# Create a string of words
my_string = '''This is a long string of words which I use to demonstrate how to use the "Counter" Python collection object.  This object
is very helpful to keep from having to write complex code in order to count the number of occurances for each word in a string of words and
then order those words by frequency, displaying the top N results.  The "Counter" Python collection object is quite simple compared to the
old, higher-complexity method of performing this action.'''.split()

# Index the first 5 words (in the list) with a slice - list object created from the split() method
my_string[:5]


['This', 'is', 'a', 'long', 'string']

In [16]:
# Capture and display word frequency without using the "Counter" object
# Create a blank dictionary
common_words = {}

# Loop over the words in the split string and track occurances in the common_words dictionary
for word in my_string:
    if word not in common_words:
        common_words[word] = 0
    common_words[word] += 1

print(common_words)

{'This': 2, 'is': 3, 'a': 2, 'long': 1, 'string': 2, 'of': 4, 'words': 3, 'which': 1, 'I': 1, 'use': 2, 'to': 6, 'demonstrate': 1, 'how': 1, 'the': 4, '"Counter"': 2, 'Python': 2, 'collection': 2, 'object.': 1, 'object': 2, 'very': 1, 'helpful': 1, 'keep': 1, 'from': 1, 'having': 1, 'write': 1, 'complex': 1, 'code': 1, 'in': 2, 'order': 2, 'count': 1, 'number': 1, 'occurances': 1, 'for': 1, 'each': 1, 'word': 1, 'and': 1, 'then': 1, 'those': 1, 'by': 1, 'frequency,': 1, 'displaying': 1, 'top': 1, 'N': 1, 'results.': 1, 'The': 1, 'quite': 1, 'simple': 1, 'compared': 1, 'old,': 1, 'higher-complexity': 1, 'method': 1, 'performing': 1, 'this': 1, 'action.': 1}


In [17]:
# Sort the words by number of occurrances, use a slice ([:10], for example) to display the top 10 results
for k, v in sorted(common_words.items(),
                   key=lambda x: x[1],
                   reverse=True)[:10]:

    print(k, v)

to 6
of 4
the 4
is 3
words 3
This 2
a 2
string 2
use 2
"Counter" 2


In [18]:
# Perform the same function with the "Counter" collection object's "most_common" method
common_words = Counter(my_string).most_common(10)

print(common_words)

[('to', 6), ('of', 4), ('the', 4), ('is', 3), ('words', 3), ('This', 2), ('a', 2), ('string', 2), ('use', 2), ('"Counter"', 2)]


#### `Counter` documentation

In [19]:
help(Counter)

Help on class Counter in module collections:

class Counter(builtins.dict)
 |  Counter(iterable=None, /, **kwds)
 |  
 |  Dict subclass for counting hashable items.  Sometimes called a bag
 |  or multiset.  Elements are stored as dictionary keys and their counts
 |  are stored as dictionary values.
 |  
 |  >>> c = Counter('abcdeabcdabcaba')  # count elements from a string
 |  
 |  >>> c.most_common(3)                # three most common elements
 |  [('a', 5), ('b', 4), ('c', 3)]
 |  >>> sorted(c)                       # list all unique elements
 |  ['a', 'b', 'c', 'd', 'e']
 |  >>> ''.join(sorted(c.elements()))   # list elements with repetitions
 |  'aaaaabbbbcccdde'
 |  >>> sum(c.values())                 # total of all counts
 |  15
 |  
 |  >>> c['a']                          # count of letter 'a'
 |  5
 |  >>> for elem in 'shazam':           # update counts from an iterable
 |  ...     c[elem] += 1                # by adding 1 to each element's count
 |  >>> c['a']                

---
### `deque`

- Notes