# 8. Dictionaries
*Set relation between x and y*

## 8.1 Introduction

So far we've seen variables that store one value or a series of values (see [section 5](5_Lists_Tuples_Sets.ipynb): lists, tuples and sets). There is another way of storing information where you associate one variable with another; in Python this is called a dictionary. Dictionaries provide a very useful way of quickly connecting variables to each other.


## 8.2 Dictionary creation & usage

It is best to think of a dictionary as a set of *key:value* pairs, with the requirement that the keys are unique (within one dictionary). Dictionaries are initiated by using curly brackets {}, placing a comma-separated list of *key:value* pairs adds initial *key:value* pairs to the dictionary. This is how a dictionary would look like:


![Gentle-hands-on-introduction-to-Python-Programming Python Dictionary](images/myDictionary-cropped.PNG)




In [2]:
myDictionary = {'A': 'Ala', 'C': 'Cys', 'D': 'Asp'}
myDictionary

{'A': 'Ala', 'C': 'Cys', 'D': 'Asp'}

You can recall or add values by using square brackets [ ] with the name of the key, or use the `get()`-method. 

In [5]:
myDictionary['A']

'Ala'

In [None]:
myDictionary.get('C')

If you would like to add a new pair of key-value: 

In [7]:
myDictionary['E'] = 'Glu'
myDictionary

{'A': 'Ala', 'C': 'Cys', 'D': 'Asp', 'E': 'Glu'}

Note however that keys are unique and if you try to add a key that already exists. with a different value, it will overwrite it. 

In [8]:
myDictionary['A'] = 'Glu'
myDictionary

{'A': 'Glu', 'C': 'Cys', 'D': 'Asp', 'E': 'Glu'}

So keys are unique, values are not!

Dictionaries, like lists, have several useful built-in methods. The most frequently used are listed here below:
- `keys()`	to list the dictionary's keys
- `values()` to list the values in the dictionary
- `get()`	call the value of a specified key
- `pop()`	to remove the specified key and its values

In [9]:
myDictionary = {'A': 'Ala', 'C': 'Cys', 'D': 'Asp', 'E': 'Glu'}
list(myDictionary.keys())

['A', 'C', 'D', 'E']

In [10]:
list(myDictionary.values())

['Ala', 'Cys', 'Asp', 'Glu']

In [11]:
myDictionary.pop('E')
myDictionary

{'A': 'Ala', 'C': 'Cys', 'D': 'Asp'}

If you try to access a key that doesn't exist, Python will give an error:

In [None]:
myDictionary = {'A': 'Ala', 'C': 'Cys', 'D': 'Asp', 'E': 'Glu'}
 
print(myDictionary['B'])

You should therefore always check whether a key exists:


In [12]:
# Newlines don't matter when initialising a dictionary...
myDictionary = {
     'A': 'Ala',
     'C': 'Cys',
     'D': 'Asp',
     'E': 'Glu',
     'F': 'Phe',
     'G': 'Gly',
     'H': 'His',
     'I': 'Ile',
     'K': 'Lys',
     'L': 'Leu',
     'M': 'Met',
     'N': 'Asn',
     'P': 'Pro',
     'Q': 'Gln',
     'R': 'Arg',
     'S': 'Ser',
     'T': 'Thr',
     'V': 'Val',
     'W': 'Trp',
     'Y': 'Tyr'}

if 'B' in myDictionary.keys():
    print(myDictionary['B'])
else:
    print("myDictionary doesn't have key 'B'!")

myDictionary doesn't have key 'B'!


---
### 8.2.1 Exercise

Use a dictionary to track how many times each amino acid code appears in the following sequence:
```
SFTMHGTPVVNQVKVLTESNRISHHKILAIVGTAESNSEHPLGTAITKYCKQELDTETLGTCIDFQVVPGCGISCKVTNIEGLLHKNNWNIED  
NNIKNASLVQIDASNEQSSTSSSMIIDAQISNALNAQQYKVLIGNREWMIRNGLVINNDVNDFMTEHERKGRTAVLVAVDDELCGLIAIADT
```
Tip: use the one-letter code as key in the dictionary, and the count as value.

---

## 8.3 A practical example of dictionaries
An practical example of dictionaries can be found in Biopython. Imagine that we want to extract some information from a GenBank file ([NC_005816](https://www.ncbi.nlm.nih.gov/nuccore/NC_005816/))   

In [34]:
# Imports the SeqIO object from Biopython
from Bio import SeqIO

# Reads in (just one record of) the GenBank file
record = SeqIO.read("data/NC_005816.gb","genbank")
print(record)

ID: NC_005816.1
Name: NC_005816
Description: Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence
Database cross-references: Project:58037
Number of features: 41
/molecule_type=DNA
/topology=circular
/data_file_division=BCT
/date=21-JUL-2008
/accessions=['NC_005816']
/sequence_version=1
/gi=45478711
/keywords=['']
/source=Yersinia pestis biovar Microtus str. 91001
/organism=Yersinia pestis biovar Microtus str. 91001
/taxonomy=['Bacteria', 'Proteobacteria', 'Gammaproteobacteria', 'Enterobacteriales', 'Enterobacteriaceae', 'Yersinia']
/references=[Reference(title='Genetics of metabolic variations between Yersinia pestis biovars and the proposal of a new biovar, microtus', ...), Reference(title='Complete genome sequence of Yersinia pestis strain 91001, an isolate avirulent to humans', ...), Reference(title='Direct Submission', ...), Reference(title='Direct Submission', ...)]
/comment=PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The 

The SeqRecord object (which we see here) has an id, name and description as well as a sequence. For other (miscellaneous) annotations, the SeqRecord object has a dictionary attribute *annotations*. Most of the annotations information gets recorded in the annotations dictionary.

In [26]:
print(record.id)
print(record.name)
print(record.description)
#print(record.seq)

NC_005816.1
NC_005816
Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence


In [21]:
record.annotations

{'molecule_type': 'DNA',
 'topology': 'circular',
 'data_file_division': 'BCT',
 'date': '21-JUL-2008',
 'accessions': ['NC_005816'],
 'sequence_version': 1,
 'gi': '45478711',
 'keywords': [''],
 'source': 'Yersinia pestis biovar Microtus str. 91001',
 'organism': 'Yersinia pestis biovar Microtus str. 91001',
 'taxonomy': ['Bacteria',
  'Proteobacteria',
  'Gammaproteobacteria',
  'Enterobacteriales',
  'Enterobacteriaceae',
  'Yersinia'],
 'references': [Reference(title='Genetics of metabolic variations between Yersinia pestis biovars and the proposal of a new biovar, microtus', ...),
  Reference(title='Complete genome sequence of Yersinia pestis strain 91001, an isolate avirulent to humans', ...),
  Reference(title='Direct Submission', ...),
  Reference(title='Direct Submission', ...)],
 'comment': 'PROVISIONAL REFSEQ: This record has not yet been subject to final\nNCBI review. The reference sequence was derived from AE017046.\nCOMPLETENESS: full length.'}

In [24]:
record.annotations['organism']

'Yersinia pestis biovar Microtus str. 91001'

In [33]:
record.annotations['source']

'Yersinia pestis biovar Microtus str. 91001'

(In general, `organism` is used for the scientific name (in Latin, e.g. *Arabidopsis thaliana*), while `source`
will often be the common name (e.g. thale cress). In this example, as is often the case, the two fields are
identical.)

In [32]:
record.annotations['accessions'] # This could be a list of values, hence the list. 

['NC_005816']

## 8.4 More with dictionaries
As mentioned here above, you can have a dictionary with a list of values for one key:

In [46]:
TriplicateExp1 = {'name': 'experiment 1', 'pH': 5.6, 'temperature': 288.0, 'volume': 200, 'calibration':'cal1', 'date':['01-01-2020','02-01-2020']}
TriplicateExp1

{'name': 'experiment 1',
 'pH': 5.6,
 'temperature': 288.0,
 'volume': 200,
 'calibration': 'cal1',
 'date': ['01-01-2020', '02-01-2020']}

You can, however, only use variables that cannot change keys (so tuples are OK, lists are not), and keys have to be unique: if you add a key that already exists, the old entry will be overwritten:

In [47]:
dates = ('date1','date2') # tuple

TriplicateExp1[dates] = ['01-01-2020','02-01-2020']
TriplicateExp1

{'name': 'experiment 1',
 'pH': 5.6,
 'temperature': 288.0,
 'volume': 200,
 'calibration': 'cal1',
 'date': ['01-01-2020', '02-01-2020'],
 ('date1', 'date2'): ['01-01-2020', '02-01-2020']}

It is also possible to have a so-called nested dictionary, in which there is a dictionary within a dictionary. 

In [43]:
TriplicateExp2 = {'name': 'experiment 2', 'pH': 5.8, 'temperature': 286.0, 'volume': 200, 'calibration':'cal1', 'date':'03-01-2020'}
TriplicateExp3 = {'name': 'experiment 3', 'pH': 5.4, 'temperature': 287.0, 'volume': 200, 'calibration':'cal1', 'date':'04-01-2020'}
Triplicate = {
    'exp1':TriplicateExp1,
    'exp2':TriplicateExp2,
    'exp3':TriplicateExp3
}
Triplicate

{'exp1': {'name': 'experiment 1',
  'pH': 5.6,
  'temperature': 288.0,
  'volume': 200,
  'calibration': 'cal1',
  'date': ['01-01-2020', '02-01-2020']},
 'exp2': {'name': 'experiment 2',
  'pH': 5.8,
  'temperature': 286.0,
  'volume': 200,
  'calibration': 'cal1',
  'date': '03-01-2020'},
 'exp3': {'name': 'experiment 3',
  'pH': 5.4,
  'temperature': 287.0,
  'volume': 200,
  'calibration': 'cal1',
  'date': '04-01-2020'}}

## 8.5 Next session

Go to our [next chapter](9_Files.ipynb).