# Dictionaries, Sets, and Their Friends

## Morning Objectives:

 * Create and use methods of dictionaries and sets
 * Explain the advantages of dictionaries and sets
 
## Afternoon Objective:

 * Create and use `defaultdict` and `Counter`

## 1. Dictionaries

Dictionaries are key-value pairs. They are useful throughout python as a simple table or data structure.

### 1.1. Dictionary initialization
For example, here's a dictionary:

In [None]:
prices = {}

In [None]:
prices['banana'] = 1
prices['steak'] = 10
prices['ice cream'] = 5

Now my dictionary will look like this:

In [None]:
prices

I can define a dictionary using the above syntax, e.g.

In [None]:
prices = {'steak': 10, 'banana': 1, 'ice cream': 5}

They can also be declared from the `dict` builtin function.

In [None]:
prices = dict([('steak', 10), ('banana', 1), ('ice cream', 5)])

In [None]:
prices

***Notes:***
* Dictionaries are unordered! The key, value pairs will not come back to you in the same order you put them in. They could even change order each time you print your dictionary.
* Dictionary keys can be of any *immutable* type.
* Dictionary keys and values are not type checked.

### 1.2. Dictionary methods

#### Dictionaries have many methods; 

#### Python 2.7 methods

 * `has_key`
 * `get`
 * `iterkeys`
 * `itervalues`
 * `iteritems`
 * `copy`
 * `update`

#### Python 3.x

 * `items`
 * `keys`
 * `copy`
 * `values`
 * `get`
 * `update`

The use of the methods for looping such as iterkeys, itervalues, and iteritems have been replaced with keys, items, and vlaues. The method has_key is replaced with the use of 'in'

# Generators / iterators

Python 2 relied heavily on creating lists for next steps.  Generators are the idea that the information you need for the next iteration is not needed until the next step.

Think about the read vs the readline methods when reading in a file one gives you everything one just gives you one line at a time.

Lets look at how the range function works.

In [None]:
def our_range(num):
    count = 0
    ret = []
    while count < num:
        ret.append(count)
        count += 1
    
    return ret

In [None]:
def our_range2(num):
    count = 0
    while count < num:
        yield count
        count += 1


In [None]:
our_range(5)

In [None]:
lst = our_range(5)
for num in lst:
    print(num)

In [None]:
our_range2(5)

In [None]:
iterator = our_range2(5)
for num in iterator:
    print(num)

In [None]:
iterator = our_range2(5)
next(iterator)

In [None]:
next(iterator)

## 2. Sets

You can think of sets as mathematical sets. You can also think of them as value-less dictionaries.

### 2.1. Set initialization

Sets can be created in the same ways as dictionaries, either from the `set` builtin or with `{}`.

In [None]:
groceries = set(['carrots', 'figs', 'popcorn'])
groceries

or

In [None]:
groceries = {'carrots', 'figs', 'popcorn'}

or

In [None]:
groceries = set()
groceries.add('carrots')
groceries.add('figs')
groceries.add('popcorn')

Note the parallels between some of the types we've learned:

| type     | builtin        |         | comprehension |
|----------|----------------|---------|---------|
| `str`    | `str(1.0)`     | `'1.0'` | |
| `tuple`  | `tuple('abc')` | `('a','b','c')` | `(x for x in 'abc')` |
| `list`   | `list('abc')`  | `['a','b','c']` | `[x for x in 'abc']` |
| `set`    | `set('abc')`   | `{'a','b','c'}` | `{x for x in 'abc'}` |
| `dict`   | `dict([('a',1),('b',2)])`   | `{'a':1,'b':2}` | `{x:i for i,x in enumerate('ab')}`|

### 2.2. Set methods

Some common set methods are
 * `add`
 * `union`
 * `intersection`
 * `difference`
 * `update`
 * `issubset`
 * `issuperset`
 * `copy`

## 3. `dict` and `set` are REALLY FAST!

These datatypes can be really convenient, but they are also super fast.

Say I stored all my data in a list. If I wanted to find out if something were in there, I would have to check each element one by one. However, if I know the index, it's really fast to look up the item. We can get this same functionality in dictionaries and sets by using a *hash function* to map key values to integers that we'll use as indices, so the amount of time to find an item is independed of the size of the dictionary or set.


## 4. `defaultdict` and `Counter`

The python standard library contains a huge number of additional types. Two common ones based on dictionaries are `defaultdict` and `Counter`. To motivate them let's look at an example.

### 4.1. `defaultdict` example

Say I want to know all the indicies that each item in a list appears. Here's the code using a standard dictionary:

In [None]:
def item_indices(lst):
    '''
    INPUT: list
    OUTPUT: dict

    Return a dictionary whose keys are the items in the list and associated
    value is a list of all the indices where that key appears.
    '''
    d = {}
    for i, item in enumerate(lst):
        d[item].append(i)
    return d

In [None]:
def item_indices(lst):
    '''
    INPUT: list
    OUTPUT: dict

    Return a dictionary whose keys are the items in the list and associated
    value is a list of all the indices where that key appears.
    '''
    d = {}
    for i, item in enumerate(lst):
        if item in d:
            d[item].append(i)
        else:
            d[item] = [i]
    return d

Here's the results of running this function on an example:

In [None]:
item_indices(['a', 'b', 'b', 'c', 'a'])


We can use a `defaultdict`. In a `defaultdict`, if we try to access a key we've never seen before, there is a default value that it gets. We just need to tell it what our default is.

We will need to import it first:

In [None]:
from collections import defaultdict

In [None]:
def item_indices2(lst):
    d = defaultdict(list)
    for i, item in enumerate(lst):
        d[item].append(i)
    return d

In [None]:
item_indices2(['a', 'b', 'b', 'c', 'a'])

Ahhh... that's so much nicer :)

### 4.2. Counter example

A very common default value is an integer. So Python has a special type of defaultdict just for that. Let's say we just want to count the occurences of each element in the list.

Here's what we would need to do with a standard dictionary:

In [None]:
def count_items(lst):
    '''
    INPUT: list
    OUTPUT: dict

    Return a dictionary whose keys are the items in the list and associated
    value is the count of the number of times that item occurred in the list.
    '''
    d = {}
    for i, item in enumerate(lst):
        if item in d:
            d[item] += 1
        else:
            d[item] = 1
    return d

We can also use the `get` method here, or we could use a `defaultdict` with type `int`, but let's use a `Counter`!

Again, we will need to import it:

In [None]:
from collections import Counter
def count_items2(lst):
    d = Counter()
    for i, item in enumerate(lst):
        d[item] += 1
    return d

Counters are even better than that! We can do the following:

In [None]:
d = Counter(['a', 'b', 'b', 'c', 'a'])
d

Counters are like defaultdicts, but they also have a special method called `most_common` that finds the element(s) with the highest value.

## 5. Performance example

Let's take another look at one of the examples we did yesterday, the divisors problem. First, yesterday's code:

In [None]:
def get_divisors(numbers, divisors):
    '''
    INPUT: list of ints, list of ints
    OUTPUT: list of ints

    Return a list of the values from the second list that are proper divisors
    of elements in the first list.
    '''
    result = []
    for d in divisors:
        for n in numbers:
            if n % d == 0:
                result.append(d)
                break
    return result

We can improve on it using a set.

In [None]:
from math import sqrt

def all_divisors(num):
    '''
    INPUT: int
    OUTPUT: list of ints

    Given an integer, return a list of all the divisors of that number.
    '''
    result = [num]
    for i in range(1, int(sqrt(num)) + 1):
        if num % i == 0:
            result.append(i)
            result.append(num / i)
    return result


def get_divisors2(numbers, divisors):
    '''
    INPUT: list of ints, list of ints
    OUTPUT: list of ints

    Return a list of the values from the second list that are proper divisors
    of elements in the first list.
    '''
    s = set()
    for num in numbers:
        s.update(all_divisors(num))
    return [divisor for divisor in divisors if divisor in s]

Let's use `ipython`'s magic method `%time` to see what the improvement is.

First we create two large lists:

In [None]:
import random

numbers = random.sample(range(1000000), 10000)
divisors = random.sample(range(1000000), 10000)

Here's how we can time running it in `ipython`:

In [None]:
%time result = get_divisors(numbers, divisors)
%time result2 = get_divisors2(numbers, divisors)

In [None]:
result == result2

These are the times I got:

| Version | Time     |
| ------- | -------- |
| old     | 20.7 sec |
| new     | 1.33 sec |

If instead I wanted only the divisors which divide at least 2 values, this would be a good place for a `Counter`.

In [None]:
# Can two numbers draw (with replacement) from a list sum to a given value?

list_ = [3, 5, 7, 9]

# make_sum(list_, 4) ==> False
# make_sum(list_, 8) ==> True


In [None]:
# Method 1

from itertools import combinations_with_replacement

def make_sum1(numbers, target):
    combinations = combinations_with_replacement(numbers, 2)
    for combo in combinations:
        if sum(combo) == target:
            return True
    return False


In [None]:
# Method 2

def make_sum2(numbers, target):
    for number in numbers:
        if target - number in numbers:
            return True
    return False


In [None]:
# Method 3

def make_sum3(numbers, target):
    numbers = set(numbers)
    for number in numbers:
        if target - number in numbers:
            return True
    return False


In [None]:
import random

number_of_samples = 1000
list_range = (1, 1000)
list_length = 100

samples = []

for s in range(number_of_samples):
    list_ = [random.randint(*list_range) for i in range(list_length)]
    target = random.randint(*list_range)
    samples.append((list_, target))


In [None]:
def test_make_sum(samples, make_sum):
    for numbers, target in samples:
        make_sum(numbers, target)

In [None]:
time test_make_sum(samples, make_sum1)

In [None]:
time test_make_sum(samples, make_sum2)

In [None]:
time test_make_sum(samples, make_sum3)

In [None]:
from collections import Counter

def top_divisors(numbers, divisors):
    '''
    INPUT: list of ints, list of ints
    OUTPUT: list

    Return a list of the divisors which divide at least 2 of the numbers.
    '''
    d = Counter()
    for num in numbers:
        for divisor in all_divisors(num):
            d[divisor] += 1
    return [divisor for divisor in divisors if d[divisor] >= 2]


In [None]:
%time result = top_divisors(numbers, divisors)