# Lists, sets and dictionaries

In this class we will look at working with multiple data. Using lists, sets and dictionaries you will all be introduced to tools on how to do this. Lambda functions will be overviewed to give you more flexibility in transforming data. We will even introduce generators which are a way to generate data that you can iterate over.

After this class you will know how to:
    * how to work with lists 
    * how to use sets
    * how to work with dictionaries
    * how to create lambda functions
    * create a generator

## Recap

Last week we look at input and output for our programs, and also importing functionality from files and modules.

### Import

Using the `import` keyword we can import functions, variables and classes from other files or modules. For example there is the `math` module has a definition of the `pi` variable.

In [None]:
import math

print(math.pi)

We also saw how we could only import specific variables and functions

In [None]:
from math import pi
print(pi)

### Input/Output

We saw that the `print` function is used to make sure that we write something to the terminal, not returning a value for a function.

#### Command line arguments

Command line arguments that are passed along to a script are available through the `sys` module that has a `argv` variable, which is a list of `str`. And remember that the first element in this list is the name of the script.

```python
import sys

print(sys.argv[0]) # prints the name of the file
```

#### Reading and writing to files

Using the `open` function we can open a file for reading or writing. This function returns a filehandler that has read functions and write functions. Remember that you need to pass `"w"` as second argument to the `open` function if you are writing to it.

```python
# read all the contents from a file
f = open("data.txt")
data = f.read()
f.close()
```

```python
# write a text to a file
f = open("output.txt", "w")
output = """Here is some
text that we want to 
write to a file
"""
f.write(output)
f.close()
```

**ALWAYS** remember to `close` the file when you are done.



## 1. Lists

Lists are the basic data structure for storing multiple values using one variable.

The syntax is quite similar to C:

In [38]:
li = [1,2,3,4,5,6] # create a new list

Lists are however not restricted to a certain type in Python, you can freely mix the types for each element:

In [57]:
[[1,2,3], "Hello", 5.6, (True, False)]

[[1, 2, 3], 'Hello', 5.6, (True, False)]

### 1.1 Initilization

Initializing lists with it's initial values can be cubersome to write, as well not as not even feasible in some cases. 

Say that you want to create a list that has the values 0 to 999. One way of achieving this would be to create an empty list and use a `for`-loop and `range` where we append in each iteration:

```python
li = []
for i in range(1000):
    li.append(i)
```

Python has a nice way of doing this with one-line though using a syntax like following: 

`[` *expression* `for` *element* `in` *sequence* `]`

This is called **list comprehension**.

In [63]:
li = [i for i in range(1000)] # create a list of values 0-999
print(li[:5], "...", li[-5:])

[0, 1, 2, 3, 4] ... [995, 996, 997, 998, 999]


### 1.2 Indexing

List indexing in Python is done using brackets `[i]`, where `0` is the first element.

In [87]:
li = [1,2,3,4,5] # create a new list

In [39]:
print(li[0])

1


You can also use negative indexing in Python to get the `i`th value starting from the end of the list. So at index `-1` you have the last element, `-2` the second from last, and so on...

In [40]:
print(li[-2])

5


It is also possible to get a subset of a list, called a `slice`, using a range of indices. This is done using the syntax *first*`:`*last*. This takes all the elements starting and including *first* until and **excluding** last.

In [41]:
print(li[1:3]) # get elements li[1], li[2]

[2, 3]


You can also omit one of the values. If you omit *first* the range will start from `0`. If last is omitted the range ends at the last element of the list (including it).

In [42]:
print(li[2:]) # get all the elements after the two first ones

[3, 4, 5, 6]


In [43]:
print(li[:3]) # get the three first elements

[1, 2, 3]


Lastly you can also specify a step within your range using the *first*`:`*last*`:`*step* syntax

In [44]:
print(li[0:4:2]) # get every second element starting a the first and stopping at the fourth

[1, 3]


In [45]:
print(li[::2]) # get every second element in the list

[1, 3, 5]


### 1.3 Re-assigning values

The indexing syntax can be to re-assign elements within a list

In [55]:
li[2] = 8 # re-assign the third element to "8"
print(li)

[10, 11, 8, 4, 5, 6]


In [54]:
li[:3] = [10, 11, 12] # re-assign the first three value to [10, 11, 12]
print(li)

[10, 11, 12, 4, 5, 6]


### 1.4 Iterating over a list

We saw before how we can do this using a `for`-loop.



In [47]:
for e in li:
    print(e)

10
11
12
4
5
6


If you need to keep track of the index in each iteration of the loop, Python's `enumerate` function comes in to hand:

In [48]:
for i, e in enumerate(li):
    print("li[", i, "] = ", e)

li[ 0 ] =  10
li[ 1 ] =  11
li[ 2 ] =  12
li[ 3 ] =  4
li[ 4 ] =  5
li[ 5 ] =  6


The builtin `len` function will give you the length of a list. This can be used in case you want to use the `range` function instead: 

In [49]:
for i in range(len(li)):
    print("li[", i, "] = ", li[i])

li[ 0 ] =  10
li[ 1 ] =  11
li[ 2 ] =  12
li[ 3 ] =  4
li[ 4 ] =  5
li[ 5 ] =  6


With the range function you can use a negative step to start at the end and iterate in reverse order instead.

In [50]:
for i in range(len(li)-1, -1, -1):
     print("li[", i, "] = ", li[i])

li[ 5 ] =  6
li[ 4 ] =  5
li[ 3 ] =  4
li[ 2 ] =  12
li[ 1 ] =  11
li[ 0 ] =  10


But an easier way would be to use Python's builtin `reversed` function.

In [51]:
for e in reversed(li):
    print(e)

6
5
4
12
11
10


### 1.5 Adding and removing elements

#### Inserting

Lists can be concatenated with each other. In the case we want to add more elements to a list we can just use the `+` operator which creates a new list that is the merge of the two.

In [20]:
print(li + [7, 8, 9])

[1, 2, 3, 4, 5, 6, 7, 8, 9]


NOTE that we can only use this functionality between lists. If you just want to add one element you would need to create a new list with the element

In [21]:
print(li + [7])

[1, 2, 3, 4, 5, 6, 7]


The `+` operator takes two lists and returns a new one, without modifying the original ones. If you want your original list to have the updated version, you need to re-assign it by using `+=` instead.

List objects have some functions to modify the original list directly also. For adding just one element lists have the `append` function that can be used instead

In [22]:
li.append(7)
print(li)

[1, 2, 3, 4, 5, 6, 7]


You can also `insert` an element at a specific index, pushing the elements after the given index one step back. `insert` takes as first parameter the 

In [24]:
li.insert(4, 8)
print(li)

[1, 2, 3, 4, 8, 5, 6, 7, 4]


#### Removing

Removing elements can be done by extracting a slice of the list as we saw earlier

In [28]:
print("remove two first elements,", li[2:])
print("remove the last three elements", li[:-3])
print("note that li is unmodified:", li)

remove two first elements, [3, 4, 5, 6, 7, 4]
remove the last three elements [1, 2, 3, 4, 5]
note that li is unmodified: [1, 2, 3, 4, 5, 6, 7, 4]


Just as when using the `+` operator we need to remember to re-assign the variable if we want to modify the actual list.

You can use the `del` statement if you want to this without assigning:

In [27]:
del li[4] # remove the fourth element (the 8 we inserted earlier)
print(li)

[1, 2, 3, 4, 5, 6, 7, 4]


And the `del` statement can also be used with a range:

In [31]:
del li[-2:] # remove the last two elements
print(li)

[1, 2, 3, 4]


You can achieve the same behavior using list's `pop` function, which takes one optional parameter. When called without parameters it defaults to removing the last element, optionally you can specify at which index you want to remove the element.

In [59]:
li.pop() # remove last element
li.pop(0) # remove first element
print(li)

[8, 4]


To remove specific values from a list you can use the list's `remove` function which will delete the *first* occurence of a value within a list

In [32]:
li.remove(3) # remove the value "3" from the list
print(li)

[1, 2, 4]


In [36]:
# create a new list of ones and test to remove the value "1"
li = [1,1,1,1] 
li.remove(1)
print(li)

[1, 1, 1]


## 2. Dictionaries

Dictionaries are mappings from hashable values to arbitrary objects. They define a list of keys that each point to an object. Below is an example of a mapping (not Python) that has the keys 'a', 'b' and 'c', that each point respectively to the values 1, 2 and 3.

```
{
    'a' => 1
    'b' => 2
    'c' => 3
}
```

### 2.1 Initialization 

In Python dictionaries kan be either initialized using the `dict` constructor and keyword input parameters as start values:

In [64]:
d = dict(a=1, b=2, c=3)
print(d)

{'a': 1, 'b': 2, 'c': 3}


We can also do this using a statement within `{}`, where we specify the keys and their values 
```python
{
    key1: value1,
    key2: value2,
}
```

In [65]:
d = {
    'a': 1,
    'b': 2,
    'c': 3,
}
print(d)

{'a': 1, 'b': 2, 'c': 3}


Notice how in the first example keys are defaulted to the `str` type (which is also the most common scenario). But using the second version we specified this manually. Using the second version we can create dictionaries with other key types than strings, and you don't need to use the same:

In [69]:
# here is a quite wild dictionary that contains a little of everything
d = {
    'a': 1,
    39: [[1,2,3],[4,5,6],[7,8,9]],
    "Hello,": "World",
    False: 0,
}
print(d)

{'a': 1, 39: [[1, 2, 3], [4, 5, 6], [7, 8, 9]], 'Hello,': 'World', False: 0}


### 2.2 Indexing and muting objects

The syntax is the same as with lists except that instead of strictly using integers for indexing you can use any type that you used for creating keys.

In [70]:
print(d['a'])
print(d[False])

1
0


To re-assign a value you just need to use the different assignment operators `=` operator.

In [71]:
d["Hello,"] = "class"
print(d)

{'a': 1, 39: [[1, 2, 3], [4, 5, 6], [7, 8, 9]], 'Hello,': 'class', False: 0}


In [72]:
d[False] += 10
print(d)

{'a': 1, 39: [[1, 2, 3], [4, 5, 6], [7, 8, 9]], 'Hello,': 'class', False: 10}


If you want to add a new key and value pair it is the exact same syntax.

In [76]:
d['b'] = 2
print(d)

{'a': 1, 39: [[1, 2, 3], [4, 5, 6], [7, 8, 9]], 'Hello,': 'class', False: 10, 'b': 2}


### 2.3. Iterating

There are many ways you can iterate over dictionaries. You can use a `for`-loop directly, and in this we will be iterating over the keys, that we can use to access the elements.

In [77]:
for key in d:
    print(key)

a
39
Hello,
False
b


If you would like for the key and the value directly in each step of the loop, you can as well use the `enumerate` function on dictionaries.

In [78]:
for key, value in enumerate(d):
    print(key, "=>", value)

0 => a
1 => 39
2 => Hello,
3 => False
4 => b


## 3. Generators [advanced]

We have looked at ways on how we can store and work with larger amounts of static data. This section will introduce you to some ideas on how we can work and iterate with more dynamic data.

There might be cases that you have an enormous amount of data that you need to process, that you just can not load at one time into memory. Instead you want to load chunks that you work with one at a time. One way of doing this is using generators.

Generators are basically functions that use a `yield` statement to return a new chunk. The function halts after the `yield` and will continue when next chunk is requested.

In [79]:
# create a generator that returns a list of values between 0:99 in chunks of 10.
def gen():
    chunk = []
    for i in range(100):
        chunk.append(i)
        if len(chunk) == 10: # when chunk has a length of 10 we yield it
            yield chunk
            chunk = [] # empty it for next time

In [86]:
print(type(gen()))

<class 'generator'>


In [84]:
for chunk in gen():
    print(chunk)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


##  4. Lambda functions

Lambda functions can be seen as one-line functions that take an input and return a new value. They are create using the `lambda` keyword: 

`lambda` *input*`:`*output*

In [88]:
la = lambda x: x + 1

In [89]:
print(type(la))

<class 'function'>


In [90]:
la(10)

11

Lambda functions can also take several inputs:

In [93]:
la = lambda x, y: x + y # create an add function as lambda 

In [92]:
la(10, 10)

20

Note how we do not need a return statement in `lamdba` functions. They are only one expression, and it is to define what is returned in the end. They are very useful when you are calling a function that takes another function as input parameter.

#### map

One such function where defining `lambda` function comes in handy is the `map` function. `map` takes two parameters:
 * a function
 * an iterable (e.g. list)

`map` iterates over the list and returns a new list as a `map` type where each value is the output of calling it's first argument (the function) with the current element. Here is an example to illustrate a little more.

In [98]:
li = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # create a list 0 - 9

In [99]:
new_list = list(map(lambda x: x**2, li)) # note how we convert to a list
print(new_list)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


#### filter

`filter` is another useful function that returns a new list as a `filter` object with all elements from the original list that give `True` for the input function.

In [100]:
new_list = list(filter(lambda x: x > 4, li))
print(new_list)

[5, 6, 7, 8, 9]
