# Dictionaries, Sets, and Their Friends

## Morning Objectives:

 * Create and leverage methods of dictionaries and sets
 * Explain the advantages of dictionaries and sets
 * Motivate the use of `defaultdict` and `Counter`

## 1. Dictionaries

Dictionaries are key-value pairs. They are useful throughout python as a simple table or data structure.

### 1.1. Dictionary initialization
For example, here's a dictionary:

In [147]:
prices = {}
prices['banana'] = 1
prices['steak'] = 10
prices['ice cream'] = 5

Now my dictionary will look like this:

In [148]:
prices

{'banana': 1, 'ice cream': 5, 'steak': 10}

We can also define a dictionary using `{key : value}` syntax:

In [149]:
prices_v2 = {'steak': 10, 'banana': 1, 'ice cream': 5}

We can also use the `dict` builtin function, which takes a list of tuples as the key:value pairs.

In [150]:
prices_v3 = dict([('steak', 10), ('banana', 1), ('ice cream', 5)])

The result is the same either way:

In [151]:
prices == prices_v2 == prices_v3

True

***Notes:***
* Dictionaries are unordered! The key, value pairs will not come back to you in the same order you put them in. They could even change order each time you print your dictionary.
* Dictionary keys can be of any *immutable* type.
* Dictionary keys and values are not type checked.

### 1.2. Dictionary methods

Dictionaries have many methods; some of the most common are
* `keys`
* `values`
* `items`
* `get`

You can investigate them from a notebook by typing `dict.` and then the tab key. Doing `dict.has_key?` will give you information about the method.

In [152]:
print(prices.keys())
print(prices.values())
print(prices.items())

dict_keys(['banana', 'steak', 'ice cream'])
dict_values([1, 10, 5])
dict_items([('banana', 1), ('steak', 10), ('ice cream', 5)])


We can use the keys and values to do all sorts of things, for example:

In [153]:
prices.update({'steak': 5})
{v: k for k, v in prices.items()}

{1: 'banana', 5: 'ice cream'}

In [154]:
reconstruction_1 = dict([(key, value) for key, value in zip(prices.keys(), prices.values())])
reconstruction_2 = {key: value for key, value in prices.items()}
prices == reconstruction_1 == reconstruction_2

True

Notice our usage of dictionary comprehensions above. More on this soon...

In [155]:
# dictionary comprehensions!
{x: x**2 for x in (2, 4, 6)}

{2: 4, 4: 16, 6: 36}

But what happens if we index into a dictionary looking for a key that does not exist?

In [156]:
try:
    prices['mangos']
except KeyError:
    print('KeyError')

KeyError


`dict.get()` to the rescue! It returns `None`, rather than raising a `KeyError` exception.

In [158]:
return_value = prices.get('mangos')
type(return_value)

NoneType

In [159]:
prices.get("mangos", "sorry, we're out of mangos")

"sorry, we're out of mangos"

We can add mangos to our dictionary of prices with `dict.update`. You can also use this method to update the value of an existing key.

In [175]:
prices.update({'mangos': 6})
prices.update({'steak': 18})
prices

{'banana': 1, 'ice cream': 5, 'mangos': 6, 'steak': 18}

In [178]:
# prices.pop('mangos')
dir(prices)

['__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

In [174]:
# list(prices.keys())[0]
prices['steak']

18

**EXCERCISE** create a dictionary of favorite movies, including everyone sitting at your table.

## 2. Sets

You can think of sets as mathematical sets. You can also think of them as value-less dictionaries.

### 2.1. Set initialization

Sets can be created in the same ways as dictionaries, either from the `set` builtin or with `{}`.

In [63]:
groceries = set(['carrots', 'figs', 'popcorn'])
groceries

{'carrots', 'figs', 'popcorn'}

or

In [64]:
groceries = {'carrots', 'figs', 'popcorn'}

or

In [65]:
groceries = set()
groceries.add('carrots')
groceries.add('figs')
groceries.add('popcorn')

### 2.2. Set methods

Some common set methods are
 * `add`
 * `union`
 * `intersection`
 * `difference`
 * `update`
 * `issubset`
 * `issuperset`
 * `copy`

In [190]:
whole_foods = {'kale', 'squash' ,'kombucha', 'granola'}
safeway = {'lettuce', 'carrot', 'seltzer', 'granola'}
bodega = {'lettuce', 'seltzer'}

In [191]:
whole_foods.union(safeway)

{'carrot', 'granola', 'kale', 'kombucha', 'lettuce', 'seltzer', 'squash'}

In [192]:
whole_foods.intersection(safeway)

{'granola'}

In [193]:
whole_foods.isdisjoint(safeway)

False

In [194]:
whole_foods.remove('granola')
whole_foods.add('hippy granola')
whole_foods.isdisjoint(safeway)

True

In [196]:
whole_foods.update(['rice'])
whole_foods

{'hippy granola', 'kale', 'kombucha', 'rice', 'squash'}

In [186]:
safeway.issuperset(bodega)
bodega.issubset(safeway)

True

## 3. Comprehensions

Note the parallels between some of the types we've learned:

| type     | builtin        |         | comprehension |
|----------|----------------|---------|---------|
| `str`    | `str(1.0)`     | `'1.0'` | |
| `tuple`  | `tuple('abc')` | `('a','b','c')` | `(x for x in 'abc')` |
| `list`   | `list('abc')`  | `['a','b','c']` | `[x for x in 'abc']` |
| `set`    | `set('abc')`   | `{'a','b','c'}` | `{x for x in 'abc'}` |
| `dict`   | `dict([('a',1),('b',2)])`   | `{'a':1,'b':2}` | `{x:i for i,x in enumerate('ab')}`|

In [232]:
abc_tuple = (x for x in 'abc')
abc_list = [x for x in 'abc']
abc_set = {x for x in 'abc'}

In [233]:
print(type(abc_tuple))
print(type(abc_list))
print(type(abc_set))

<class 'tuple'>
<class 'list'>
<class 'set'>


## 3.1 An aside about generators

In python 3, the tuple comprehension creates a generator, not an actual tuple (yet)!

In [230]:
# next(abc_tuple)

You can also convert a the current state of a generator to its aspirational object by calling a method on it.

In [231]:
print(tuple(abc_tuple))

('a', 'b', 'c')


In [211]:
tuple(abc_tuple)

('b', 'c')

In [212]:
def thankyou_for_calling(name):
    while True:
        yield print("Thank you for calling, {}.".format(name))
        yield print("{}, your call is very important to us.".format(name))
        yield print("{} -- don't worry -- a customer care representative will be with you shortly".format(name))

In [213]:
response = thankyou_for_calling('Elliot')

In [221]:
next(response)

Elliot, your call is very important to us.


## 3.2 Back to comprehensions

Comprehensions are wonderfully useful for compact iteration.

In [222]:
def square(x): return x*x

[square(val) for val in [1,2,3,4,5]]

[1, 4, 9, 16, 25]

## 3. `dict` and `set` are REALLY FAST!

These datatypes can be really convenient, but they are also super fast.

Say I stored all my data in a list. If I wanted to find out if something were in there, I would have to check each element one by one. However, if I know the index, it's really fast to look up the item. We can get this same functionality in dictionaries and sets by using a *hash function* to map key values to integers that we'll use as indices, so the amount of time to find an item is independet of the size of the dictionary or set.


## 4. `defaultdict` and `Counter`

The python standard library contains a huge number of additional types. Two common ones based on dictionaries are `defaultdict` and `Counter`. To motivate them let's look at an example.

### 4.1. `defaultdict` example

Say I want to know all the indicies where an item appears in a list. Consider the following list:

In [234]:
alphabet_soup = ['a','b','b','b','c','c','a','a']

Here's how we could do it using a standard dictionary:

In [235]:
def item_indices(lst):
    '''
    INPUT: list
    OUTPUT: dict

    Return a dictionary whose keys are the items in the list and associated
    value is a list of all the indices where that key appears.
    '''
    d = {}
    for i, item in enumerate(lst):
        if item in d:
            d[item].append(i)
        else:
            d[item] = [i]
    return d

Here's the results of running this function on an example:

In [236]:
item_indices(alphabet_soup)

{'a': [0, 6, 7], 'b': [1, 2, 3], 'c': [4, 5]}

Note that I have to do something different if it's the first time I see a new item. This is a really common situation to be in. Instead of an `if` block, you can use `defaultDict`.

In a `defaultdict`, if we try to access a key we've never seen before, there is a default value that it gets. We just need to tell it what our default is.

We will need to import it first:

In [122]:
from collections import defaultdict

In [237]:
def item_indices2(lst):
    d = defaultdict(list)
    for i, item in enumerate(lst):
        d[item].append(i)
    return d

In [238]:
item_indices2(alphabet_soup)

defaultdict(list, {'a': [0, 6, 7], 'b': [1, 2, 3], 'c': [4, 5]})

In [239]:
item_indices2(alphabet_soup) == item_indices(alphabet_soup)

True

Ahhh... that's so much nicer :)

### 4.2. Counter example

A very common default value is an integer. So Python has a special type of defaultdict just for that. Suppose we want to count the occurences of each element in a list. Consider the following (very long) list:

In [240]:
import string
import random

N = 10**6
alphabet_soup = "".join(random.choice(string.ascii_lowercase) for i in range(N))
alphabet_soup[:100]

'hplgdunkgwplnedxueffdatsqonjokoadbjwxpjnsfgqzrwrrbmmujrtjbckipynmjnjctxvaklmcacwuzoxbldgelbbnxpxjqzv'

In [127]:
note = "The expected number of times each letter will appear\
is N/26 = {}".format(N//26)
print(note)

The expected number of times each letter will appearis N/26 = 38461


Here's what we would need to do with a standard dictionary:

In [128]:
def count_with_if_block(lst):
    '''
    INPUT: list
    OUTPUT: dict

    Return a dictionary whose keys are the items in the list and 
    value is the count of the number of times that item occurred in the list.
    '''
    d = {}
    for i, item in enumerate(lst):
        if item in d:
            d[item] += 1
        else:
            d[item] = 1
    return d

In [129]:
count_with_if_block(alphabet_soup)

{'a': 38586,
 'b': 38381,
 'c': 38215,
 'd': 38501,
 'e': 38097,
 'f': 38453,
 'g': 38637,
 'h': 38232,
 'i': 38340,
 'j': 38318,
 'k': 38830,
 'l': 38589,
 'm': 38504,
 'n': 38568,
 'o': 38239,
 'p': 38257,
 'q': 38653,
 'r': 38335,
 's': 38072,
 't': 38404,
 'u': 38819,
 'v': 38495,
 'w': 38452,
 'x': 38456,
 'y': 38653,
 'z': 38914}

In [249]:
ice_cream = defaultdict(lambda: 'vanilla')
ice_cream['Elliot']

'vanilla'

In [130]:
def count_with_defualt_dict(lst):
    d = defaultdict(int)
    for i, item in enumerate(lst):
        d[item] += 1
    return d

count_with_defualt_dict(alphabet_soup)

defaultdict(int,
            {'a': 38586,
             'b': 38381,
             'c': 38215,
             'd': 38501,
             'e': 38097,
             'f': 38453,
             'g': 38637,
             'h': 38232,
             'i': 38340,
             'j': 38318,
             'k': 38830,
             'l': 38589,
             'm': 38504,
             'n': 38568,
             'o': 38239,
             'p': 38257,
             'q': 38653,
             'r': 38335,
             's': 38072,
             't': 38404,
             'u': 38819,
             'v': 38495,
             'w': 38452,
             'x': 38456,
             'y': 38653,
             'z': 38914})

Counters are even faster! We can do the following:

In [131]:
from collections import Counter
Counter(alphabet_soup)

Counter({'a': 38586,
         'b': 38381,
         'c': 38215,
         'd': 38501,
         'e': 38097,
         'f': 38453,
         'g': 38637,
         'h': 38232,
         'i': 38340,
         'j': 38318,
         'k': 38830,
         'l': 38589,
         'm': 38504,
         'n': 38568,
         'o': 38239,
         'p': 38257,
         'q': 38653,
         'r': 38335,
         's': 38072,
         't': 38404,
         'u': 38819,
         'v': 38495,
         'w': 38452,
         'x': 38456,
         'y': 38653,
         'z': 38914})

In [250]:
%%timeit
count_with_if_block(alphabet_soup)

178 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [251]:
%%timeit
count_with_defualt_dict(alphabet_soup)

149 ms ± 2.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [252]:
%%timeit
Counter(alphabet_soup)

62.5 ms ± 424 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Counters are like defaultdicts, but they also have a special method called `most_common` that finds the element(s) with the highest value.

## 5. Performance example

Let's take another look at one of the examples we did yesterday, the divisors problem. First, yesterday's code:

In [135]:
def get_divisors(numbers, divisors):
    '''
    INPUT: list of ints, list of ints
    OUTPUT: list of ints

    Return a list of the values from the second list that are proper divisors
    of elements in the first list.
    '''
    result = []
    for d in divisors:
        for n in numbers:
            if n % d == 0:
                result.append(d)
                break
    return result

We can improve on it using a set.

In [136]:
from math import sqrt

def _all_divisors(num):
    '''
    INPUT: int
    OUTPUT: list of ints

    Given an integer, return a list of all the divisors of that number.
    '''
    result = [num]
    for i in range(1, int(sqrt(num)) + 1):
        if num % i == 0:
            result.append(i)
            result.append(num / i)
    return result

def get_divisors_v2(numbers, divisors):
    '''
    INPUT: list of ints, list of ints
    OUTPUT: list of ints

    Return a list of the values from the second list that are proper divisors
    of elements in the first list.
    '''
    s = set()
    for num in numbers:
        s.update(_all_divisors(num))
    return [divisor for divisor in divisors if divisor in s]

Let's use `ipython`'s magic method `%time` to see what the improvement is.

First we create two large lists:

In [137]:
numbers = random.sample(range(1000000), 10000)
divisors = random.sample(range(1000000), 10000)

Here's how we can time running it in `ipython`:

In [138]:
%time result = get_divisors(numbers, divisors)
%time result2 = get_divisors_v2(numbers, divisors)

CPU times: user 6.57 s, sys: 5.67 ms, total: 6.57 s
Wall time: 6.58 s
CPU times: user 519 ms, sys: 1.16 ms, total: 520 ms
Wall time: 520 ms


In [139]:
result == result2

True

These are the times I got:

| Version | Time     |
| ------- | -------- |
| old     | 7.32 sec |
| new     | 0.61 sec |

If instead I wanted only the divisors which divide at least 2 values, this would be a good place for a `Counter`.

In [140]:
get_divisors_v2([225], [1, 5, 7, 9, 15])

[1, 5, 9, 15]

In [143]:
from collections import Counter

def top_divisors(numbers, divisors):
    '''
    INPUT: list of ints, list of ints
    OUTPUT: list

    Return a list of the divisors which divide at least 2 of the numbers.
    '''
    d = Counter()
    for num in numbers:
        for divisor in _all_divisors(num):
            d[divisor] += 1
    return [divisor for divisor in divisors if d[divisor] >= 2]


In [144]:
%time result = top_divisors(numbers, divisors)

CPU times: user 620 ms, sys: 2.49 ms, total: 622 ms
Wall time: 621 ms
