Dictionaries
============

We've alluded to dictionaries in some of the other lectures.  If you're familiar with the computer science "hash" or "map" data structures, a dictionary is essentially the Python equivalent of those.

For those unfamiliar with Python dictionaries, we can use an actual dictionary as a mental model.  In a dictionary you have words, and those words have definitions that are associated with them. You might have multiple definitions, but they are all associated with one word's entry in the dictionary.

This maps to the data structure very well: each entry is a key-value pair, where the keys are the words, and the values are the definitions.  If there are multiple definitions, you might instead have a list of definitions instead of a single definition for the value, but the idea is the same.

So in Python we can create an empty dictionary with a pair of braces (curly brackets):

In [None]:
woobsters = {}

We can then inject entries into the dictionary using indexing notation to set the value for a key.  So for example, if the word is 'mutable' and the definition is 'liable to change', then we create the entry like this:

In [None]:
woobsters['mutable'] = 'liable to change'

We can add another entry, and print the result:

In [None]:
woobsters['immutable'] = 'unchanging over time or unable to be changed'
woobsters

Note the repeated structure of `key: value` in the dictionary.  When you use IPython to inspect a dictionary, IPython will print the keys in alphabetical order, but this is a feature of IPython, not an intrinisc characteristic of a dictionary: there is no order for the keys in a dictionary.  It doesn't matter what order you put things in to the dictionary, or whether they can be sorted alphabetically, or anything like that.

For example, if we print the dictionary explicitly, so that IPython doesn't have a chance to sort things, you _may_ get a different order:

In [None]:
print(woobsters)

The only guarantee that a dictionary makes is that you have a single key associated with a value in the dictionary.  It is an unordered mapping of unique keys to values.

To look up the value of a key in a dictionary, use indexing notation with the key, so to ask for `'mutable'` you will get the corresponding value like this:

In [None]:
woobsters['mutable']

and similarly for `'immutable'`:

In [None]:
woobsters['immutable']

Because there is no order, you can't use a positional index 0 to ask for the "0th" item in the dictionary.  If you try you get a `KeyError` exception:

In [None]:
woobsters[0]

In fact what's happening is that it is checking to see if you stored something associated with the key 0 in the dictionary. The keys of a dictionary don't have to be strings: you can use integers or any other immutable object.

Examples
--------

Let's consider how you might create a data structure to store synonyms of words, rather than definitions.  A dictionary still works well in this case if you use the word as the key, and the value is the full list of synonyms you might have for that word.

So you might get something like this:

In [None]:
synonyms = {}
synonyms['mutable'] = ['changeable', 'variable', 'varying', 'fluctuating', 'shifting', 'inconsistent',
                       'unpredictable', 'inconstant', 'fickle', 'uneven', 'unstable', 'protean']
synonyms['immutable'] = ['fixed', 'set', 'rigid', 'inflexible', 'permanent', 'established', 'carved in stone']
synonyms

An advanced example might involve a large time-series which has events only occasionally.

In this example, you might have 4 events, each of which has a magnitude and a width:

In [None]:
e1 = {'mag': 0.05, 'width': 20}
e2 = {'mag': 0.04, 'width': 25}
e3 = {'mag': 0.05, 'width': 80}
e4 = {'mag': 0.03, 'width': 30}

And then each event occurs at a particular time (represented as an integer index in the original array):

In [None]:
events = {500: e1, 760: e2, 3001: e3, 4180: e4}

Another example might be a database with records holding the fields first name, last name and age.  The result of a query to this database might produce a list of dictionaries which looks something like this:

In [None]:
people = [
    {'first': 'Sam', 'last': 'Malone', 'name': 35},
    {'first': 'Woody', 'last': 'Boyd', 'name': 21},
    {'first': 'Norm', 'last': 'Peterson', 'name': 34},
    {'first': 'Diane', 'last': 'Chambers', 'name': 33}
]

Another common pattern simply maps a string key to a value.  If you were keeping an inventory of parts in a warehouse you might create a dictionary that maps part names to the number of parts in the warehouse:

In [None]:
inventory = dict([('foozelator', 123), ('frombicator', 18), ('spatzleblock', 34), ('snitzelhogen', 23)])
inventory

(note another way of creating a dictionary here using the `dict` function with a list of tuples of `(key, value)` pairs).

If a shipment comes in that gives you another frombicator, then you can add one to the frombicator entry:

In [None]:
inventory['frombicator'] += 1
inventory

What Can be Keys?
-----------------

Values in dictionaries can be anything.  Keys, on the other hand, must be immutable (technically, they must be _hashable_, but to be hashable there must be something immutable that can be used to generate the hash).

* Integers and strings are very common keys
* Floats (or even complex) can be used, but aren't recommended because of round-off errors
* Tuples and frozensets are also allowed for keys
* Lists and dictionaries are *not* allowed for keys because they are mutable.

To understand why floating point numbers are not good dictionary keys, consider the following example (you may want to refer back to the numbers lecture to understand what is happening here):

In [None]:
data = {}
data[1.1 + 2.2] = 6.6
data[3.3]

In [None]:
data

Tuples as Keys
--------------

Using tuples as keys in a dictionary is a common strategy when you want to represent a directed graph with some sort of weighting on the edges.  

For example, if the nodes are cities (say Austin, New York and Seattle) and the edges have weights given by the number of flights from one city to another.  This is a directed graph, because you will probably have a different value for the flights from Austin to New York compared to the number of flights from New York to Austin.

One way of storing this data is with a dictionary of connections:

In [None]:
connections = {}

If we have a connection, say, _from_ New York _to_ Seattle with 100 flights, then we create an entry for that in the dictionary with key `('New York', 'Seattle')` and value 100:

In [None]:
connections[('New York', 'Seattle')] = 100

We might also have 200 flights from Austin to New York, but 400 flights from New York to Austin:

In [None]:
connections[('Austin', 'New York')] = 200
connections[('New York', 'Austin')] = 400

So you can query the graph for New York to Austin:

In [None]:
connections[('New York', 'Austin')]

Or from Austin to New York:

In [None]:
connections[('Austin', 'New York')]

And they hold different values.

Note that if you are seriously into graph theory you should look into the [NetworkX](http://networkx.github.io/) package, which is available from the Canopy package manager.

Copyright 2008-2016, Enthought, Inc.<br>Use only permitted under license.  Copying, sharing, redistributing or other unauthorized use strictly prohibited.<br>http://www.enthought.com