![Py4Eng](img/logo.png)

# Dictionaries

## Reminder: Lists
Lists are a data structure used to store **ordered** collections of elements (`int`, `float`, `str`, etc.).

In [None]:
organisms = ['Pan troglodytes', 'Gallus gallus', 'Xenopus laevis', 'Vipera palaestinae']

We access elements of lists by using their _index_:

In [None]:
print(organisms[0])
print(organisms[2])

## Dictionaries

__Dictionaries__ are _hashtables_: a data structure used to store collections of elements to be accessed with a _key_. 

It is a mapping of (key : value):

* **Keys can be of any _immutable_ type** - strings, integers, floats, tuples, etc. Each key refers to a _value_.
* values can be of **any** type!
* a key cannot appear more than once in a dictionary.
* a value can appear several times in a dictionary.

### Defining dictionaries:

In [None]:
taxonomy = {
    'Pan troglodytes': 'Mammalia', 
    'Gallus gallus': 'Aves', 
    'Xenopus laevis': 'Amphibia', 
    'Vipera palaestinae': 'Reptilia'
}

In this dictionary, the _keys_ are the organisms and the _values_ are the taxonomic classification of each organism. Both are of type `str`.

Another example would be a dictionary representing the number of observations of various species:

In [None]:
observations = {
    'Equus zebra': 143,
    'Hippopotamus amphibius': 27,
    'Giraffa camelopardalis': 71,
    'Panthera leo': 112
}

Here, the keys are of type `str` and the values are of type `int`. Any other combination could be used.

### Accessing dictionary records
Accessing a dictionary record is similar to what we did with lists, only this time we'll use a __key__ instead of an __index__:

In [None]:
print(taxonomy['Pan troglodytes'])
print(taxonomy['Gallus gallus'])

trying to access a missing key yields an error:

In [None]:
print(taxonomy['Ailuropoda melanoleuca'])

Using the command `get` will prevent the exception in case that `key` is not in the dictionary, and will return another value (with default = `None`)

In [None]:
print(taxonomy.get('Pan troglodytes'))

print(taxonomy.get('Ailuropoda melanoleuca'))

print(taxonomy.get('Ailuropoda melanoleuca', 0))

### Changing and adding records
We can change the dictionary by simply assigning a new value to a key.

In [None]:
taxonomy['Pan troglodytes'] = 'Mammals'
print(taxonomy['Pan troglodytes'])

Similarly, we can use this syntax to add new records: 

In [None]:
taxonomy['Danio rerio'] = 'Actinopterygii'
print(taxonomy['Danio rerio'])
print(taxonomy)

__Note 1__: The fact that we can change elements of the dictionary and dynamically add more elements suggests that `dict` is a **mutable** type.

__Note 2__: A dictionary may not contain multiple records with the same __key__, but it may contain many keys with the same __value__.

### Looping over dictionary items

By default, `for` loops over the dictionary keys:

In [None]:
for organism in taxonomy:
    print('{} is of class {}'.format(organism, taxonomy[organism]))

**Note**: the dictionary items are ordered since Python 3.6. If you need to define your own order, you can use [OrderedDict](https://docs.python.org/3/library/collections.html#collections.OrderedDict).

We can even change values while looping, as this doesn't affect the keys collection (changing what you loop over is dangerous!):

In [None]:
observations = {
    'Equus zebra': 143,
    'Hippopotamus amphibius': 27,
    'Giraffa camelopardalis': 71,
    'Panthera leo': 112
}
for animal in observations:
    observations[animal] = observations[animal] > 50
print(observations)

We can iterate over the `keys` or the `values` and even tuples of the format (key, value):

In [None]:
for elem in taxonomy.keys(): # similar to iterating over taxonomy
    print(elem, end=' ')
print('\n')

for elem in taxonomy.values():
    print(elem, end=' ')
print('\n')

for elem in taxonomy.items():
    print(elem, end=' ')

we can generate lists of keys / values / items:

In [None]:
print(list(taxonomy.keys()), end='\n\n')
print(list(taxonomy.values()), end='\n\n')
print(list(taxonomy.items()))

In [None]:
sorted(taxonomy.keys())

We can check if a dictionary contains a *key* using the `in` operator:

In [None]:
'Vipera palaestinae' in taxonomy

In [None]:
'Bos taurus' in taxonomy

In [None]:
for organism in ('Vipera palaestinae', 'Bos taurus', 'Drosophila melanogaster'):
    if organism in taxonomy:
        print('{} is of class {}'.format(organism, taxonomy[organism]))
    else:
        print('{} not found'.format(organism))

The above code uses an idiom called __peak before you leap__ - checking if a key is in the dictionary before getting it's value to avoid a `KeyError`.

Another way to do it, which is usually prefered, is the __Easier to ask forgivenss than to ask permission__, which uses exceptions:

In [None]:
for organism in ('Vipera palaestinae', 'Bos taurus', 'Drosophila melanogaster'):
    try:
        print('{} is of class {}'.format(organism, taxonomy[organism]))
    except KeyError:
        print('{} not found'.format(organism))

Although exception are somewhat less efficient than `if` in terms of performance, in the latter example we do only a single lookup (no `in`) and moreover, it is stable in multi-threaded applications, whereas in the former example a different thread could in principle change the dictionary between the check (`in`) and the access (`[..]`).

Let's write a code that runs over a string, and creates a histogram - a dictionary where each key is a letter, and each value is the letter count.

In [None]:
seq = 'ATGCCCAGTTAGCAGTACGTGCGGGGTCAAGATCAGGTGTGA'
hist = {}
for letter in seq:
    if letter in hist:
        hist[letter] = hist[letter] + 1
    else:
        hist[letter] = 1
print(hist)

And now more elegant:

In [None]:
seq = 'ATGCCCAGTTAGCAGTACGTGCGGGGTCAAGATCAGGTGTGA'
hist = {}
for letter in seq:
    hist[letter] = hist.get(letter, 0) + 1
print(hist)

## Exercise: secret

Given in the code below is a dictionary (named `code`) where the keys represent encrypted characters and the values are the corresponding decrypted characters. Use the dictionary to decrypt an ecnrypted message (named `secret`) and print out the resulting cleartext message.

In [None]:
secret = """Mq osakk le eh ue usq qhp, mq osakk xzlsu zh Xcahgq,
mq osakk xzlsu eh usq oqao ahp egqaho,
mq osakk xzlsu mzus lcemzhl gehxzpqhgq ahp lcemzhl oucqhlus zh usq azc, mq osakk pqxqhp ebc Zokahp, msauqjqc usq geou dat rq,
mq osakk xzlsu eh usq rqagsqo,
mq osakk xzlsu eh usq kahpzhl lcebhpo,
mq osakk xzlsu zh usq xzqkpo ahp zh usq oucqquo,
mq osakk xzlsu zh usq szkko;
mq osakk hqjqc obccqhpqc, ahp qjqh zx, mszgs Z pe heu xec a dedqhu rqkzqjq, uszo Zokahp ec a kaclq iacu ex zu mqcq obrfblauqp ahp ouacjzhl, usqh ebc Qdizcq rqtehp usq oqao, acdqp ahp lbacpqp rt usq Rczuzos Xkqqu, mebkp gacct eh usq oucbllkq, bhuzk, zh Lep’o leep uzdq, usq Hqm Meckp, mzus akk zuo iemqc ahp dzlsu, ouqio xecus ue usq cqogbq ahp usq kzrqcauzeh ex usq ekp."""

code = {'w': 'x', 'L': 'G', 'c': 'r', 'x': 'f', 'G': 'C', 'E': 'O', 'h': 'n', 'O': 'S', 'y': 'q', 'R': 'B', 'd': 'm', 'f': 'j', 'i': 'p', 'o': 's', 'g': 'c', 'a': 'a', 'u': 't', 'k': 'l', 'q': 'e', 'r': 'b', 'V': 'Z', 'X': 'F', 'N': 'K', 'B': 'U', 'T': 'Y', 'M': 'W', 'U': 'T', 'm': 'w', 'C': 'R', 'J': 'V', 't': 'y', 'S': 'H', 'v': 'z', 'e': 'o', 'D': 'M', 'p': 'd', 'K': 'L', 'A': 'A', 'P': 'D', 'l': 'g', 's': 'h', 'W': 'X', 'H': 'N', 'j': 'v', 'z': 'i', 'I': 'P', 'b': 'u', 'Z': 'I', 'F': 'J', 'Y': 'Q', 'Q': 'E', 'n': 'k'}

In [None]:
# your code here

# Sets

A [set](https://docs.python.org/3.5/tutorial/datastructures.html#sets) is an **unordered collection** with **unique elements**, similar to the mathematical concept of a [set](https://en.wikipedia.org/wiki/Set_%28mathematics%29) (קבוצה). 

Curly braces (`{}`) or the `set()` function can be used to create sets. 

In [None]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
print(basket) # duplicates have been removed
type(basket)

Basic uses include eliminating duplicate entries (as above, one apple and one orange were eliminated), and fast membership testing:

In [None]:
print('orange' in basket)
print('crabgrass' in basket)

In [None]:
a = list(range(1000))
b = {i for i in range(1000)}

%timeit 1000 in a
%timeit 1000 in b
print()
%timeit 503 in a
%timeit 503 in b
print()
%timeit 1 in a
%timeit 1000 in b

Set objects also support set-theoretical operations like union, intersection, difference, and symmetric difference.

In [None]:
a = set('abracadabra')
b = set('alacazam')
print(a)
print(b)
type(b)

Letters in `a` but not in `b`:

In [None]:
a - b

Letters in either `a` or `b`:

In [None]:
a | b

Letters in both `a` and `b`:

In [None]:
a & b

Letters in `a` or `b` but not both (xor):

In [None]:
a ^ b

To create an empty set you have to use `set()`, not `{}`; the latter creates an empty dictionary.

In [None]:
Ø = set()
print(Ø)
type(Ø)

Note that a `set` is mutable:

In [None]:
print(a)
a.add('z')
print(a)

There is also an immutable set, called `frozenset`:

In [None]:
a = frozenset('abracadabra')
print(type(a))
print(a)

# Solutions
## Solution: secret

In [None]:
cleartext = ""
for c in secret:
    try:
        cleartext += code[c]
    except KeyError:
        cleartext += c
print(cleartext)

## Colophon
This notebook was written by [Yoav Ram](http://python.yoavram.com).

The notebook was written using [Python](http://python.org/) 3.7.
Dependencies listed in [environment.yml](../environment.yml).

This work is licensed under a CC BY-NC-SA 4.0 International License.

![Python logo](https://www.python.org/static/community_logos/python-logo.png)