# Collections module in Python

In [12]:
bio = """
Hi, I'm Zibran Zarif Amio. I specialize in Machine Learning with Python.
I built ReplyMind Chrome Extension from scratch. Follow me on GitHub and
YouTube to get updates on my latest projects.
"""

Say, we have to count the number of words in this bio. We can use dictionaries like this:

In [13]:
word_counts = {}

for word in bio.split():
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1

print(word_counts)

{'Hi,': 1, "I'm": 1, 'Zibran': 1, 'Zarif': 1, 'Amio.': 1, 'I': 2, 'specialize': 1, 'in': 1, 'Machine': 1, 'Learning': 1, 'with': 1, 'Python.': 1, 'built': 1, 'ReplyMind': 1, 'Chrome': 1, 'Extension': 1, 'from': 1, 'scratch.': 1, 'Follow': 1, 'me': 1, 'on': 2, 'GitHub': 1, 'and': 1, 'YouTube': 1, 'to': 1, 'get': 1, 'updates': 1, 'my': 1, 'latest': 1, 'projects.': 1}


An alternative approach may be something like this:

In [14]:
word_counts = {}

for word in bio.split():
    prev_count = word_counts.get(word, 0)
    word_counts[word] = prev_count + 1

print(word_counts)

{'Hi,': 1, "I'm": 1, 'Zibran': 1, 'Zarif': 1, 'Amio.': 1, 'I': 2, 'specialize': 1, 'in': 1, 'Machine': 1, 'Learning': 1, 'with': 1, 'Python.': 1, 'built': 1, 'ReplyMind': 1, 'Chrome': 1, 'Extension': 1, 'from': 1, 'scratch.': 1, 'Follow': 1, 'me': 1, 'on': 2, 'GitHub': 1, 'and': 1, 'YouTube': 1, 'to': 1, 'get': 1, 'updates': 1, 'my': 1, 'latest': 1, 'projects.': 1}


In both of these approaches, we have to initialize each key's value as 0. Then increment as we go on. What if we could provide a default value for each key without having to initialize it every time.

#### `defaultdict`

In [15]:
from collections import defaultdict

word_counts = defaultdict(int)          # int() produces 0

for word in bio.split():
    word_counts[word] += 1

#### `Counter`
We could actually just use a one-liner to do the job :):)

In [16]:
from collections import Counter

word_counts = Counter(bio.split())

In [17]:
print(word_counts)

Counter({'I': 2, 'on': 2, 'Hi,': 1, "I'm": 1, 'Zibran': 1, 'Zarif': 1, 'Amio.': 1, 'specialize': 1, 'in': 1, 'Machine': 1, 'Learning': 1, 'with': 1, 'Python.': 1, 'built': 1, 'ReplyMind': 1, 'Chrome': 1, 'Extension': 1, 'from': 1, 'scratch.': 1, 'Follow': 1, 'me': 1, 'GitHub': 1, 'and': 1, 'YouTube': 1, 'to': 1, 'get': 1, 'updates': 1, 'my': 1, 'latest': 1, 'projects.': 1})


#### `Counter().most_common()`
Get most frequent tokens

In [20]:
print(word_counts.most_common(5)) # returns 5 most frequent tokens

[('I', 2), ('on', 2), ('Hi,', 1), ("I'm", 1), ('Zibran', 1)]
