### Handling Missing Keys with `setdefault`

This notebook contains example code from [*Fluent Python*](http://shop.oreilly.com/product/0636920032519.do), by Luciano Ramalho.

Code by Luciano Ramalho, modified by Allen Downey.

MIT License: https://opensource.org/licenses/MIT

In this notebook, we show two ways to create a mapping of words and their occurrences. For each word in a text file, we create a list of tuples, one for each occurrence. The tuple values represent the position (line and column) of the word in the file. (Note that the line and column positions are indexed starting at one.)

In [1]:
import os
import sys
import re

In [2]:
file_name = 'text.txt'

In [3]:
# temporary text file
with open(file_name, 'w') as f:
    f.write('Fluent Python notebooks\nJupyter notebooks')

In [4]:
WORD_RE = re.compile('\w+')

In [5]:
# using `.get()` method
index = {}
with open(file_name) as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start()+1
            location = (line_no, column_no)
            # this is ugly; coded like this to make a point
            occurrences = index.get(word, [])  # <1>
            occurrences.append(location)       # <2>
            index[word] = occurrences          # <3>

In [6]:
# print in alphabetical order
for word in sorted(index, key=str.upper):
    print(word, index[word])

Fluent [(1, 1)]
Jupyter [(2, 1)]
notebooks [(1, 15), (2, 9)]
Python [(1, 8)]


In [7]:
# better solution
# using `.setdefault()` method
index = {}
with open(file_name) as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start()+1
            location = (line_no, column_no)
            index.setdefault(word, []).append(location)  # <1>

In [8]:
# print in alphabetical order
for word in sorted(index, key=str.upper):
    print(word, index[word])

Fluent [(1, 1)]
Jupyter [(2, 1)]
notebooks [(1, 15), (2, 9)]
Python [(1, 8)]


This shows us that both blocks of code do the same things. However, we use less lines of code with the `.setdefault()` method and it's also more efficient, using a single lookup as opposed up to three with the `.get()` method.

In [9]:
os.remove('text.txt')