# Sets

Python includes a builtin type called *sets* that are a generalization of lists to the situation where:  order no longer matters and repetition is not allowed. 

We define sets using curly brackets; or they can be converted from lists:

In [2]:
set1 = {1, 10, 100, 1000}
set1

{1, 10, 100, 1000}

In [3]:
set2 = set( [x for x in dir(set) if '_' != x[0] ] ) # Makes a set from the list generator expression
set2

{'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update'}

Like wise we can also convert sets into lists:

In [4]:
a = list( set1 )

However note because the order of a set does not matter, the order of the list we get will be arbitrary - the actual order a set is displayed depends on the computer we are using.

In [5]:
# Order does not matter

{ 'dog', 'cat', 'bird' } == {'cat', 'dog', 'bird'}

True

Sets are mutable, and some of the methods will modify a set in place:

In [6]:
x = {2, 3, 4}
x.discard(4)
x

{2, 3}

The main thing I use sets for is if I want to get the unique elements from a list:


In [7]:
pets = ['dog', 'cat', 'bird', 'cat', 'cat', 'dog', 'cow', 'chicken']
pets

['dog', 'cat', 'bird', 'cat', 'cat', 'dog', 'cow', 'chicken']

In [8]:
unique_pets = set(pets)
unique_pets

{'bird', 'cat', 'chicken', 'cow', 'dog'}

Because sets are not uniquely ordered, they are not interable, so you can't use a for loop to do something to each element of a set - you would need to first convert it to a list.

# Dictionaries

A very useful type in Python is a dictionary. To see how useful it is consider the following problem:

Suppose we want to write a function that takes a string and counts how many times each letter in the alphabet is used? What would we do with the tools we have now?

Lists are restricted to only allow us to use the index to look up values:

In [9]:
x = [10, 30, 60, 180, 90, 5]
x[3] # we can only ask what is the value at a given position

180

This is useful for a lot of problems, but not always the most useful thing we could have. For example in counting the number of times each letter is used, we could define position 0 to correspond to 'a', 1 to 'b', ...., 25 to 'z'. 

But wouldn't it be easier to just use 'a' as the key instead of having to *translate* it to an integer?

A dictionary does exactly that for us, it generalizes a list to be something that uses an arbitrary key.


In [10]:
x_dict = dict() # start with an empty dictionary
x_dict['a'] = 1 # add a key 'a' with value 1
x_dict['b'] = 3 
x_dict['c'] = 10

In [11]:
x_dict

{'a': 1, 'b': 3, 'c': 10}

Dictionaries are displayed as a set of key:value pairs. Lookup is the same syntax as looking up a list by index, except now the index must be a value coming from the keys:

In [12]:
x_dict['a']

1

In [13]:
# if you use a key that is not yet included, you get an error:
x_dict['z']

KeyError: 'z'

The methods dictionaries have:

In [14]:
[x for x in dir(x_dict) if '_' != x[0] ]

['clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

The keys method lists the keys, the values method lists the values:

In [16]:
x_dict.keys()

dict_keys(['a', 'b', 'c'])

In [17]:
x_dict.values()

dict_values([1, 3, 10])

# Iterating Over a Dictionary

Iterating over a dictionary with for, iterates over the valid keys:

{'a': 2, 'b': 4, 'c': 11}

Note also that this example shows that dictionaries, like lists, are *mutable*.

# Counting Letters

Let's write a function that updates a dictionary with how many times a letter has been used in a string. It will take an empty dictionary and only add a letter to it if it needs to:


## Inverting a dictionary

You could imagine that a useful thing to do, would be to invert a dictionary. For example to give a dictionary that takes the frequencies 1, 2, 3, ... and returns the set of letters that had that frequency.

# Memos

You might recall our first attempt at computing the fibonacci sequence did not work very well:

In [49]:
def fibonacci(n):
    
    if n>1:
        return fibonacci(n-1) + fibonacci(n-2)
    elif n==1:
        return 1
    else:
        return 0
    


In [58]:
fibonacci(35)

9227465

The reason it does not do very well is the tree nature of the recursion for this function. It ends up calling itself for the same value of n over and over. This is very ineffecient.

What would work better is if the function keeps track of the values it has computed and remembers them. This is called a *memo* and converting functions to do this is called *memoization*. 

By the way, this is a subtle issue that in mathematics we almost never worry about. 

We will use a global variable, a dictionary, fibonacci.known to record the values of the fibonacci function we have already computed. When the function is called on some $n$, it will first check if this $n$ is in the dictionary and return that value, and only when it is not will it do the recursion step. It will also before it *returns* the value record it in the dictionary for future calls to use:

Test it, but you should see it is now substantially faster.