# Collections Module

The collections module is a built-in module that implements specialized container data types providing alternatives to Pythonâ€™s general purpose built-in containers. We've already gone over the basics: dict, list, set, and tuple.

Now we'll learn about the alternatives that the collections module provides.

## Counter

*Counter* is a *dict* subclass which helps count hashable objects. Inside of it elements are stored as dictionary keys and the counts of the objects are stored as the value.

(Hashable objects: In Python, any immutable object (such as an integer, boolean, string, tuple) is hashable, meaning its value does not change during its lifetime).

Let's see how it can be used:

In [2]:
from collections import Counter

**Counter() with lists**

In [3]:
lst = [1,2,2,2,2,3,3,3,1,2,1,12,3,2,32,1,21,1,223,1]

Counter(lst)

Counter({1: 6, 2: 6, 3: 4, 12: 1, 32: 1, 21: 1, 223: 1})

**Counter with strings**

In [4]:
Counter('aabsbsbsbhshhbbsbs')

Counter({'b': 7, 's': 6, 'h': 3, 'a': 2})

**Counter with words in a sentence**

In [5]:
s = 'How how many times does each word show up in this sentence word times each each word .'
# 'How' and 'how' are different. So case matters. If you want to treat them as same, just do Counter(s.lower().split())
words = s.split()

Counter(words)

Counter({'each': 3,
         'word': 3,
         'times': 2,
         'How': 1,
         'how': 1,
         'many': 1,
         'does': 1,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1,
         '.': 1})

In [6]:
# Methods with Counter()
c = Counter(words)

c.most_common() # Returns list of tuples

[('each', 3),
 ('word', 3),
 ('times', 2),
 ('How', 1),
 ('how', 1),
 ('many', 1),
 ('does', 1),
 ('show', 1),
 ('up', 1),
 ('in', 1),
 ('this', 1),
 ('sentence', 1),
 ('.', 1)]

In [7]:
# To get the top two most common elements
c.most_common(2)

[('each', 3), ('word', 3)]

## Common patterns when using the Counter() object

    sum(c.values())                 # total of all counts
    c.clear()                       # reset all counts
    list(c)                         # list unique elements
    set(c)                          # convert to a set
    dict(c)                         # convert to a regular dictionary 
    c.items()                       # convert to a list of (elem, cnt) pairs
    Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
    c.most_common()[:-n-1:-1]       # n least common elements
    c += Counter()                  # remove zero and negative counts

**Thus, Counter is not a dict object**

In [8]:
list_of_pairs = c.items()

In [9]:
list_of_pairs

dict_items([('How', 1), ('how', 1), ('many', 1), ('times', 2), ('does', 1), ('each', 3), ('word', 3), ('show', 1), ('up', 1), ('in', 1), ('this', 1), ('sentence', 1), ('.', 1)])

In [10]:
dict(list_of_pairs)

{'How': 1,
 'how': 1,
 'many': 1,
 'times': 2,
 'does': 1,
 'each': 3,
 'word': 3,
 'show': 1,
 'up': 1,
 'in': 1,
 'this': 1,
 'sentence': 1,
 '.': 1}

In [11]:
Counter(dict(list_of_pairs))

Counter({'each': 3,
         'word': 3,
         'times': 2,
         'How': 1,
         'how': 1,
         'many': 1,
         'does': 1,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1,
         '.': 1})

In [12]:
c.most_common()

[('each', 3),
 ('word', 3),
 ('times', 2),
 ('How', 1),
 ('how', 1),
 ('many', 1),
 ('does', 1),
 ('show', 1),
 ('up', 1),
 ('in', 1),
 ('this', 1),
 ('sentence', 1),
 ('.', 1)]

In [29]:
# Want to find 3 least common elements so n = 3.
c.most_common()[:-3-1:-1]

[('.', 1), ('sentence', 1), ('this', 1)]

In [36]:
# Initializing Counter 
d = Counter({'apple': 2, 'banana': 0, 'cherry': -1})

In [37]:
# To remove zero and negative counts
d=d+Counter()
d

Counter({'apple': 2})

## defaultdict

defaultdict is a dictionary-like object which provides all methods provided by a dictionary. The difference is that defaultdict never raises a KeyError. If you ask for a key that isn't present in the default dictionary, it'll create the key and assign it with some default value. We can choose what the acutal default value should be by passing it in defaultdict. We do this using lambda expression.

Syntax: defaultdict(default_factory) \
default_factory: A function returning the default value for the dictionary defined. If this argument is absent then the dictionary raises a KeyError.

Using defaultdict is faster than doing the same using dict.set_default method.


In [38]:
from collections import defaultdict

In [39]:
d = {}

In [40]:
d['one'] 

KeyError: 'one'

In [49]:
d  = defaultdict(object)
# When you pass object as default value, This means that if you access a key that doesn't exist in the defaultdict, it will create a new key with a default value of an instance of the 'object' class. Since object is a base class for all classes in Python, it essentially creates an empty object as the default value.

In [50]:
d['one'] # It shows the empty object stored for key 'one'

<object at 0x26482a69740>

In [65]:
# It shows all the keys in the defaultdict
for item in d:
    print(item)

one


Can also initialize with default values:

In [59]:
d = defaultdict(lambda: 0)

In [60]:
d['one']

0

# namedtuple
The standard tuple uses numerical indexes to access its members, for example:

In [66]:
t = (12,13,14)

In [15]:
t[0]

12

For simple use cases, this is usually enough. On the other hand, remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields and is constructed far from where it is used. A namedtuple assigns names, as well as the numerical index, to each member. 

Each kind of namedtuple is represented by its own class, created by using the namedtuple() factory function. The arguments are the name of the new class and a list containing the names of the attributes.

You can basically think of namedtuples as a very quick way of creating a new object/class type with some attribute fields.
For example:

In [67]:
from collections import namedtuple

In [68]:
Dog = namedtuple('Dog',['age','breed','name'])

sam = Dog(age=2,breed='Lab',name='Sammy')

frank = Dog(age=2,breed='Shepard',name="Frankie")

We construct the namedtuple by first passing the object type name (Dog) and then passing a list of attributes. We can then call on the various attributes:

In [69]:
sam

Dog(age=2, breed='Lab', name='Sammy')

In [70]:
sam.age

2

In [71]:
sam.breed

'Lab'

In [72]:
sam[0]

2

## Conclusion

Hopefully you now see how incredibly useful the collections module is in Python and it should be your go-to module for a variety of common tasks!