# The __`collections`__ module
* the __`collections`__ module contains a bunch of useful types which are derived from (read: inherited from) some of the built-in types we're already familiar with

## Ordered Dictionaries
* since Python 3.6 all dictionaries are ordered which retain their insertion order, i.e., the order in which you insert the items is in the order in which you iterate through them.
* because of which `from collections import OrderedDict` is no longer used as much unless you need some of the additional methods provided by it

In [2]:
d = {}
d['one'] = 10
d['two'] = 20
d['three'] = 30
print(d)

{'one': 10, 'two': 20, 'three': 30}


In [4]:
#d['five']
d.get('five', 'nothing to see here, keep moving!')

'nothing to see here, keep moving!'

# The __`collections`__ module: Default Dictionaries

## Default Dictionaries
* suppose we need a default value for any key which does not exist in the dictionary
 * we can use the __`get()`__ function, or __`setdefault()`__ (or the __`in`__ operator), or we can use a `Default Dictionary`

In [5]:
def count_letters(word):
  """Return a dict of letters and how many times the letter appeared in the word"""

  count = {}

  for letter in word:
    count[letter] = count.get(letter, 0) + 1

  return count

count_letters('FnlwjdkhngSSSwoldfjnwkdjfvne')

{'F': 1,
 'n': 4,
 'l': 2,
 'w': 3,
 'j': 3,
 'd': 3,
 'k': 2,
 'h': 1,
 'g': 1,
 'S': 3,
 'o': 1,
 'f': 2,
 'v': 1,
 'e': 1}

In [6]:
from collections import defaultdict

def count_letters(word):
  """Return a dict of letters and how many times the letter appeared in the word"""

  # When creating a defaultdict
  # the passed argument dictates what the
  # default value will be (int = 0, str = '', list = [])
  count = defaultdict(int)

  for letter in word:
    count[letter] += 1

  return count

count_letters('FnlwjdkhngSSSwoldfjnwkdjfvne')

defaultdict(int,
            {'F': 1,
             'n': 4,
             'l': 2,
             'w': 3,
             'j': 3,
             'd': 3,
             'k': 2,
             'h': 1,
             'g': 1,
             'S': 3,
             'o': 1,
             'f': 2,
             'v': 1,
             'e': 1})

## Lab: Default Dictionaries
* read from a file where each line is a word followed by a count, e.g.,
<pre>
    apple 2
    pear 3
    cherry 5
    apple 3
    pear 6
    apple 1
</pre>
(as shown above, words may be duplicated)
* generate a __`defaultdict`__ where the keys are the words and the value are a _list_ of all the counts for that word, e.g.,
<pre>
defaultdict(&lt;class 'list'>, {'apple': ['2', '3', '1'], 'pear': ['3', '6'], 'cherry': ['5']})
</pre>

## Now, for more fun, let's implement a default dictionary without using the __`collections`__ module
* In other words, make your own class (e.g., MyDefaultDict)
* What class or classes should it inherit from?
* You will need to create the method __`__getitem__(self, key)__`__ which is what Python uses under the hood to retrieve an item from a dictionary
 * if the key in question is not currenty in the dict, what should you return?

In [11]:
class CustomDefaultDict(dict):
  def __init__(self, default_factory, **kwargs):
    self.default_factory = default_factory
    super().__init__(**kwargs)

  def __getitem__(self, key):
    if key not in self:
      self[key] = self.default_factory()

    return super().__getitem__(key)

# {'name': 'Usman Bashir', 'age': 9000}
# dict(name='Usman Bashir', age=9000)
CustomDefaultDict(None, name='Usman Bashir', age=9000)

{'name': 'Usman Bashir', 'age': 9000}

In [14]:
int()
str()
list()

[]

In [21]:
cd = CustomDefaultDict(int)

print(cd)

print(cd['missing_key'])

cd['missing_key'] = 42

print(cd['missing_key'])

{}
0
42


# The __`collections`__ module: Deque

# Deque
* pronounced "deck"
* like a list, but optimized for faster append and pop operations
* double ended queue, meaning you can add or remove elements from both the front and back
* O(1) time complexity for efficient append and pop operations
  * constant amount of time regardless of the input size

* what is it good for?
  * fast and efficient appends and pops
  * implementing queues and stacks
  * when you need a fixed size
* what's the catch?
  * uses a bit more memory then lists
  * is slow when accessing a random item, O(n) compared to O(1) for lists
    * an amount of time that grows linearly with the input size
* but what can i use it for?
  * first-in, first-out (FIFO) queue
  * last-in, first-out (LIFO) stack
  * LRU(least recently used) cache
    * though the `OrderedDict` might be a more efficient option
    * but the `functools.lru_cache` decorator already provides this functionality

In [22]:
from collections import deque

dq = deque(range(10), maxlen=10) # maxlen is optional

print(dq)

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)


In [23]:
dq.rotate(3)

print(dq)

deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)


In [24]:
dq.rotate(-4)

print(dq)

deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)


In [25]:
dq.appendleft('a')

print(dq)

deque(['a', 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)


In [26]:
dq.extend('bcd')

In [27]:
print(dq)

deque([3, 4, 5, 6, 7, 8, 9, 'b', 'c', 'd'], maxlen=10)


In [28]:
dq.extendleft((-1, -2, -3))

print(dq)

deque([-3, -2, -1, 3, 4, 5, 6, 7, 8, 9], maxlen=10)


In [29]:
dq.pop()

9

In [30]:
dq.popleft()

-3

In [31]:
print(dq)

deque([-2, -1, 3, 4, 5, 6, 7, 8], maxlen=10)


In [32]:
dq.remove(3)

print(dq)

deque([-2, -1, 4, 5, 6, 7, 8], maxlen=10)


In [33]:
dq.reverse()

print(dq)

deque([8, 7, 6, 5, 4, -1, -2], maxlen=10)


In [34]:
dq.append(0)

In [35]:
print(dq)

deque([8, 7, 6, 5, 4, -1, -2, 0], maxlen=10)


# Lab: Deque
* use a deque to print the last *n* lines of file, much like __`tail`__ in Linux
* remember that you can iterate through a file a line at a time

# The __`collections`__ module: Named Tuples

## Named Tuples
* tuples are quite handy, but they are missing a key feature when using them as records–sometimes we want to name the fields
 * more efficient (i.e., less memory) than dictionaries because instances don't need to contain the keys themselves, as dictionaries do, just the values
* __`namedtuple()`__ returns not an individual object but a new class, customized for the given names

In [36]:
from collections import namedtuple

"""
The first argument is the name of the tuple class itself

The second argument is the attribute names as a iterable of strings
or a single space/comma-delimited string
"""
Point = namedtuple('Point', 'x y')

point1 = Point(1, 2)

print(point1, type(point1), sep='\n')

Point(x=1, y=2)
<class '__main__.Point'>


In [37]:
point2 = Point(-3, -2)

print(point2)

Point(x=-3, y=-2)


In [38]:
print(point1[0], point1[1])

1 2


In [39]:
print(point1.x, point1.y)

1 2


In [40]:
City = namedtuple('City', 'name country population coordinates')

tokyo = City('Tokyo', 'Japan', 36.99, (35.689, 139.691))

print(tokyo)

City(name='Tokyo', country='Japan', population=36.99, coordinates=(35.689, 139.691))


In [41]:
print(tokyo.population)
print(tokyo.coordinates)
print(tokyo[1])

36.99
(35.689, 139.691)
Japan


In [42]:
type(City), type(tokyo)

(type, __main__.City)

In [43]:
for field in City._fields:
  print(field)

name
country
population
coordinates


In [44]:
LatLong = namedtuple('LatLong', 'lat long')

riyadh_data = ('Riyadh', 'Saudi Arabia', 7.09, LatLong(24.633, 46.716))

In [45]:
riyadh = City._make(riyadh_data)

riyadh

City(name='Riyadh', country='Saudi Arabia', population=7.09, coordinates=LatLong(lat=24.633, long=46.716))

In [46]:
ri = riyadh._asdict()

print(ri)

{'name': 'Riyadh', 'country': 'Saudi Arabia', 'population': 7.09, 'coordinates': LatLong(lat=24.633, long=46.716)}


# Lab: Named Tuples
1. Create a named tuple called __`Card`__ (representing a playing card) which has two fields, __`rank`__ and __`suit`__
2. Create a list of __`Card`__s, which, when initialized, contains all 52 cards in a deck
3. In other words, the list (or deck) should contain  

`[Card(rank=2, suit='clubs'), Card(rank=3, suit='clubs'), Card(rank=4, suit='clubs'), ..., Card(rank='Q', suit='spades'), Card(rank='K', suit='spades'), Card(rank='A', suit='spades')] `
* ranks = 2, 3, 4, ..., 10, J, Q, K, A (strings)
* suits = clubs, hearts, diamonds, spades (strings)

# The __`collections`__ module: Counters

## Counters
* __`dict`__ subclass for counting things
* unordered collection where things being counted are lists, string, or `dict` keys and the counts are `dict` values
* __`Counters`__ can have negative values

* what can i use it for?
  * count word frequencies in a document or letter frequencies in a string
  * keep track of inventory items and their quantities
  * analyzing the frequency of events or items in a dataset
  * etc...

In [47]:
from collections import Counter

c = Counter()

c

Counter()

In [48]:
c = Counter(['apple', 'banana', 'apple', 'orange', 'banana', 'apple'])
c

Counter({'apple': 3, 'banana': 2, 'orange': 1})

In [49]:
c = Counter('FnlwjdkhngSSSwoldfjnwkdjfvne')
c

Counter({'n': 4,
         'w': 3,
         'j': 3,
         'd': 3,
         'S': 3,
         'l': 2,
         'k': 2,
         'f': 2,
         'F': 1,
         'h': 1,
         'g': 1,
         'o': 1,
         'v': 1,
         'e': 1})

In [50]:
"test" * 3

'testtesttest'

In [51]:
c.update('establish' * 10)
c

Counter({'s': 20,
         'l': 12,
         'h': 11,
         'e': 11,
         't': 10,
         'a': 10,
         'b': 10,
         'i': 10,
         'n': 4,
         'w': 3,
         'j': 3,
         'd': 3,
         'S': 3,
         'k': 2,
         'f': 2,
         'F': 1,
         'g': 1,
         'o': 1,
         'v': 1})

In [52]:
c = Counter({'red': 5, 'blue': -1})
c

Counter({'red': 5, 'blue': -1})

In [54]:
c = Counter(foo=1, bar=2)
c

Counter({'bar': 2, 'foo': 1})

In [55]:
c = Counter(red=6, blue=5, green=3, pink=1, yellow=-3)
c.elements()

<itertools.chain at 0x74f7acb98970>

In [56]:
for color in c.elements():
  print(color, end=' ')

red red red red red red blue blue blue blue blue green green green pink 

In [57]:
c.most_common(3)

[('red', 6), ('blue', 5), ('green', 3)]

In [58]:
c.items()

dict_items([('red', 6), ('blue', 5), ('green', 3), ('pink', 1), ('yellow', -3)])

## Lab: Counters
* Use a __`Counter`__ to count the words in a file
* That is, read in a file, separate it into words, and use a __`Counter`__ to count the number of occurrences of each word in the file.
* Print out the 10 most common words in the file