# Programming in a Python language for data collection and analysis

Student: Demidenko A. A., group IS/b-18-1-z


## Dictionaries

Consider the following problem we have information about students grades in a certain subject and we want to be able to work with this information - for example, by the name of a student to determine what grade he received. We could try to solve this problem by creating two lists - one with the names of students and the other with grades:

In [2]:
students = ["Vassa", "Koal", "Peter", "Ann"]
grades = [5, 4, 2, 3]
# Vassa got 5, Koal 4 and so on

It would be nice if we could have a data type in which the elements are numbered not by natural numbers but by arbitrary objects. Such a data type exists in it is name is *dictionary*.

This is how we can create a dictionary in Python:

In [3]:
gradebook = {"Vassa": 5, "Koal": 4, "Peter":2, "Ann": 3}

This is similar to creating a list but there are a number of differences. First, we used curly brackets instead of square brackets to show that we are creating a dictionary. Second, the dictionary consists of *entries*, each entry consists of two parts: of a *key* and a *value*. Key and value are separated by a colon. For example we have an entry `"Ann": 3` with the key `"Ann"` and the value `3`. In total our `gradebook` dictionary now contains four entries, the keys of which are the names of students and the values is their grades.

In [4]:
gradebook

{'Vassa': 5, 'Koal': 4, 'Peter': 2, 'Ann': 3}

Note that when printing reordered entries in the dictionary. In fact the order of output of entries in the dictionary is arbitrary inside the dictionary entries do not have any order. So we can not access, for example, the first entry but we can access the entry with it's key:

In [5]:
gradebook['Ann']

3

In [6]:
gradebook['Vassa']

5

We can change the value of an entry in the same way that we can change a list item.

In [7]:
gradebook['Ann'] = 5

In [8]:
gradebook

{'Vassa': 5, 'Koal': 4, 'Peter': 2, 'Ann': 5}

We can add a new record:

In [9]:
gradebook['Kesha'] = 4

In [10]:
gradebook

{'Vassa': 5, 'Koal': 4, 'Peter': 2, 'Ann': 5, 'Kesha': 4}

If we try to access a record that doesn't exist, we will get an error:

In [11]:
gradebook['Alice']

KeyError: 'Alice'

Often we want to be able to request a record and if it is not there to get some default value and not an error. For this we need to use `get()` method instead of square brackets:

In [12]:
gradebook.get('Alice')

Here returned `None`:

In [13]:
print(gradebook.get('Alice'))

None


In [14]:
gradebook.get('Vassa')

5

We could pass a second argument to `get()`, and then if there is no such key in the dictionary, it will be returned:

In [15]:
gradebook.get('Alice', 'No such student')

'No such student'

In [16]:
gradebook.get('Vassa',  'No such student')

5

We can get a list of all the dictionary keys:

In [17]:
gradebook.keys()

dict_keys(['Vassa', 'Koal', 'Peter', 'Ann', 'Kesha'])

It's not really a list but this thing behaves almost like a list and we can make a list out of it. In a similar way to we can get a list of all dictionary values:

In [18]:
gradebook.values()

dict_values([5, 4, 2, 5, 4])

The keys of dictionaries can be not only strings. Let's say we want to create a dictionary in which the keys are numbers. There is nothing easier than this:

In [19]:
squares = { 1: 1, 2: 4, 3: 9 }

In [20]:
squares

{1: 1, 2: 4, 3: 9}

In [21]:
squares[1]

1

In [22]:
squares[2]

4

In the previous two lines squares behaves roughly like a list, but if we look closely we can see that this is not a list but still a dictionary.

In [23]:
squares

{1: 1, 2: 4, 3: 9}

For example any non empty list has an element with an index 0, but squares does not:

In [24]:
squares[0]

KeyError: 0

## Iterating through entries in the dictionary

How to process information in the dictionary? To iterate through all the elements of the list we can use the `for` loop. And what happens if we feed it a dictionary instead of a list? Let's try:

In [25]:
for k in gradebook: 
    print(k)

Vassa
Koal
Peter
Ann
Kesha


It is clear! The `for` loop in this case iterates through all the *keys* of our dictionary. And knowing the key we can get the value:

In [27]:
for k in gradebook:
    print("Student", k, "has grade", gradebook[k])

Student Vassa has grade 5
Student Koal has grade 4
Student Peter has grade 2
Student Ann has grade 5
Student Kesha has grade 4


However, there is a more elegant way to get the key and value of the next record at once using `items()`:

In [28]:
for k, v in gradebook.items(): 
    print("Student", k, "has grade", v)

Student Vassa has grade 5
Student Koal has grade 4
Student Peter has grade 2
Student Ann has grade 5
Student Kesha has grade 4


How this code works? Here we use `items()` method that returns a list (iterator) consisting of tuples of the form (key, value):

In [29]:
list(gradebook.items())

[('Vassa', 5), ('Koal', 4), ('Peter', 2), ('Ann', 5), ('Kesha', 4)]

The `for` operator in this case understands that it is necessary to select another tuple each time the loop passes and assign its first element (key) of the variable `k` and the second element (value) of the variable `v` (of course these variables could be called differently). With similar behavior we have already met when discussing the `enumerate` construction.

This is how we can find all entries with a given value - for example all students who received a grade 4:

In [30]:
for k, v in gradebook.items(): 
    if v == 4:
        print(k)

Koal
Kesha


Note that such a "search by value" requires searching through all entries in the dictionary and if the dictionary is large it will take a long time although the "search by key" will still be performed quickly. By the way we can quickly check whether there is an entry in the dictionary with this key:

In [31]:
"Koal" in gradebook

True

In [32]:
"Alice" in gradebook

False

If we wanted to search among the values we would have to explicitly specify this using the method `values()`:

In [33]:
1 in gradebook.values()

False

In [34]:
4 in gradebook.values()

True

In general the `in` operator is not only limited to use with dictionaries it can be used for example with lists:

In [35]:
5 in [1, 2, 3, 5, 8]

True

In [36]:
6 in range(1, 5)

False

## Creating dictionaries and the zip() function

There are different ways to create dictionaries. For example we can create an empty dictionary and gradually fill it with elements:

In [37]:
my_dict = {}

In [38]:
my_dict[1] = 1
my_dict['hello'] = 'world'

In [39]:
my_dict

{1: 1, 'hello': 'world'}

Note that the same dictionary perfectly combines elements of different types (in this case strings and integers).

We can create a dictionary otherwise by passing the `dict()` function a list consisting of key-value pairs (in some sencs this is reversed operation to `items()` method):

In [40]:
dict([('hello', 'world'), ('one', 'two')])

{'hello': 'world', 'one': 'two'}

Let's say we have two lists one contains the names of students and the other their grades. How can we create a dictionary from these lists for which the names would be keys and the grades would be values?

And here it is:

In [41]:
students = ["Vasa", "Koal", "Peter", "Ann"]
grades = [5, 4, 2, 3]
new_gradebook = list(zip(students,grades))
new_gradebook

[('Vasa', 5), ('Koal', 4), ('Peter', 2), ('Ann', 3)]

It uses a convenient function `zip()` the use of which is not limited to the creation of dictionaries. Like a zipper it fastens hence the name of several lists. For example `zip()` makes a list of pairs from a pair of lists:

In [42]:
list(zip([1,2,3],['a','b','c']))

[(1, 'a'), (2, 'b'), (3, 'c')]

This construction can be used when we need to iterate through the elements of two related lists. For example, this is how we can output information about which student has what grade without using dictionaries:

In [43]:
 for student, grade in zip(students, grades): 
    print(student, "has grade", grade)

Vasa has grade 5
Koal has grade 4
Peter has grade 2
Ann has grade 3


The `zip()` function can also be used with more than two lists:

In [44]:
list(zip([1, 2, 3, 4], [5, 6, 7, 8], ['a', 'b', 'c', 'd']))
# there are three lists, thats why we'll get list of tripples

[(1, 5, 'a'), (2, 6, 'b'), (3, 7, 'c'), (4, 8, 'd')]

If one of the lists is shorter, `zip()` will trim the rest of the lists.

In [45]:
list(zip([1, 2, 3], ['a', 'b']))

[(1, 'a'), (2, 'b')]

## Which objects can be dictionary keys

So far, we have considered dictionaries whose keys are strings and numbers. In fact, the keys can also be more complex objects. For example, imagine such an implementation of a fragment of the addition table in the form of a dictionary:

In [46]:
sums = {(2, 3): 5, (4, 1): 5, (5, 7): 12}

In [47]:
sums

{(2, 3): 5, (4, 1): 5, (5, 7): 12}

Here keys is a tuples which consists of two numbers and values - it's sums:

In [48]:
sums[(2, 3)]

5

In [49]:
sums[(4, 1)]

5

In [50]:
sums[(5, 7)]

12

Here is an important difference between tuples and lists: the latter cannot be dictionary keys because they can change:

In [51]:
sums = { [1,2]: 3 } # error

TypeError: unhashable type: 'list'

This concludes our brief introduction to dictionaries and moves on to the next topic.

## List comprehensions

We have previously often encountered such a problem: given a list in which numbers are written but in the form of strings. To create a new list in which the numbers would be numbers, we could solve this problem using a loop:

In [52]:
str_list = ["1", "5", "12", "7"]

int_list = []
for s in str_list:
    int_list.append(int(s))
    
print(int_list)

[1, 5, 12, 7]


Three lines are responsible for creating a new list. Writing them every time is quite boring and the creators came up with (or rather borrowed from functional programming languages and those borrowed from mathematicians) a much more elegant syntax. It works like this:

In [53]:
int_list = [int(s) for s in str_list]

The square brackets around the expression should suggest that we are creating a list (because when we need to create a list we usually enclose its elements in square brackets). The expression inside the brackets should be read literally:

*a list consisting of `int(s)` elements `for` `s` elements from the `in` list `str_list`*

In [54]:
int_list

[1, 5, 12, 7]

See? The quotation marks have disappeared – in front of us a list consisting of numbers. Magic! The original list `str_list` has not changed:

In [55]:
str_list

['1', '5', '12', '7']

Similarly we can apply any operation to the list elements. For example, we will square all the elements from `int_list`:

In [56]:
[x**2 for x in int_list]

[1, 25, 144, 49]

or double all the list items:

In [58]:
double_list = [x * 2 for x in int_list]

In [59]:
double_list

[2, 10, 24, 14]

or add 1 to them:

In [60]:
[x + 1 for x in int_list]

[2, 6, 13, 8]

or turn them into floating point numbers:

In [61]:
[float(x) for x in int_list]

[1.0, 5.0, 12.0, 7.0]

As we can see we can do anything with list items. But that's not all. In the syntax of list comprehension we can filter. For example we only need those items that are larger than 6. We can select them this way:

In [62]:
[x for x in int_list if x > 6]

[12, 7]

When we write here `x for x` we mean that you just need to substitute the elements of the old one into the new list without doing anything with them just selecting the necessary ones. But we can also modify them somehow:

In [63]:
[x**2 for x in int_list if x > 6]

[144, 49]

Now let's solve this problem there are two lists with numbers and we want to find their element sum:

In [64]:
X = [2, 5, 8]
Y = [1, 3, 100]

It can be solved in this way: to iterate through the elements of two lists at the same time, we use the `zip()` (construction discussed above):

In [65]:
Z = []
for x, y in zip(X, Y):
    Z.append(x + y)
print(Z)

[3, 8, 108]


But with list comprehension, the same code looks much nicer:

In [66]:
[x + y for x, y in zip(X, Y)]

[3, 8, 108]

By the way we can use a syntax similar to list comprehension to create dictionaries:

In [67]:
squared = {i: i**2 for i in range(10)} 
squared

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

## The map() function

List comprehension have an analog that is now considered not too convenient, but sometimes there is: a `map()` function.

In [68]:
str_list

['1', '5', '12', '7']

This is how it is solved using `map()`:

In [69]:
int_list = list(map(int, str_list))

In [70]:
list(int_list)

[1, 5, 12, 7]

The `map()` function takes two arguments. The first argument it takes is a function after that it applies this function to each of the list items. In general entries of the form `list(map (int, str_list))` and `[int(x) for x in str_list]` are almost equivalent.

When the action to be applied already exists as a function (as in the case of `int`), the construction with `map()` looks even more concise than list comprehension, but if we need to do something less trivial, list comprehension is clearly easier.

In [71]:
[int(x) + 1 for x in str_list]

[2, 6, 13, 8]

To implement this with `map()`, we need to declare a new function that will return the value of the expression `int(x) + 1` and pass it to `map()`:

In [72]:
def my_func(x): 
    return int(x) + 1

In [73]:
list(map(my_func, str_list))

[2, 6, 13, 8]

For brevity you can use `lambda` functions but this approach is much less transparent than list comprehension and it is not recommended to use it now.

## Two words about efficiency

Using list comprehension is not only pleasant but also useful they work more efficiently than code with a loop.

In [74]:
from random import random
from math import sqrt
N = 10000
mylist = [random() for _ in range(N)]

In [75]:
%%timeit
newlist = []
for x in mylist:
    newlist.append(sqrt(x))

1.43 ms ± 219 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [76]:
%%timeit
newlist = [sqrt(x) for x in mylist]

919 µs ± 52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [77]:
%%timeit
newlist = list(map(sqrt, mylist))

559 µs ± 154 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


As we can see from this data (the magic word `%%timeit` allows us to measure how much time is spent on some operation) list comprehension are faster than a normal loop. `map()` works at about the same speed as list comprehension (sometimes a little slower, sometimes a little faster).

## Complex data structures

Lists allows us to save a certain number of values but often we need to be able to work with more complex structures for example with tables. In some programming languages there are two dimensional arrays. The analog of a two dimensional array in Python is a "list of lists", that is a list whose elements are other lists. With something similar we have already met.

Let's take an example a table that records the results of several homework assignments for several students. Let's say we assigned some numbers to students and so we don't need to know who is what their name is. It can be written as a list of lists for example by strings:

In [80]:
table = [ ["HW1", "HW2", "HW3", "HW4"], 
          [4, 3, 4, 4], 
          [3, 4, 3, 4], 
          [4, 5, 5, 4]
        ]

Here each element of the table list is a row of our table that is also a list. For example this is how you can find out what is written in the third row and fourth column of our table:

In [81]:
table[2][3]

4

What happened here? We first called the third row of the table with:

In [83]:
table[2]

[3, 4, 3, 4]

And then from this third line we selected the fourth element with the help of `[3]`. We could write this in more detail:

In [84]:
row = table[2]
print(row[3])
# row[3] is same with table[2][3]

4


This is how we can print all the elements of the table in rows:

In [85]:
for row in table: 
    print(*row)

HW1 HW2 HW3 HW4
4 3 4 4
3 4 3 4
4 5 5 4


Now let's say that we still want to know which student received what grade. Then we could use a dictionary instead of a list of lists where the lists would be values:

In [86]:
gradebook = {'Bill': [4, 3, 2], 'Alice': [3, 4, 5], 'Bob': [5, 5, 4]}

This is how you can see what grade Bob received on the second homework:

In [87]:
gradebook['Bob'][1]

5