# Python Crash Course 02 - Data Structures & for-Loops

## Lists, tuples, and sets
- [Video tutorial (29 min)](https://www.youtube.com/watch?v=W8KRzm-HUcc)
- Library reference:
    - [lists](https://docs.python.org/3/library/stdtypes.html#lists)
    - [tuples](https://docs.python.org/3/library/stdtypes.html#tuples)
    - [sets](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset)

### Lists

A list can be used to store a sequence of arbitrary items. The elements don't need to be of the same datatype. Its length is variable and every item can be replaced or deleted after assignment, i.e. a list is _mutable_.

**List-Properties:**
* stores data from any datatype (can be mixed as well)
* iterable
* mutable
* can contain mutable items

**Python-Syntax:** `list_name = [item0, item1, item2, ...]`

In [None]:
list_of_arbitrary_items = [1, 2, 3.5, 4, 4, "I am a list item (*-*)"]

Every list item can be accessed by its index. Keep in mind, that the **indices start at `0`**! Usually, you iterate through the list from the 'left' to the 'right' but you can also go the other way around with negative indices. In this case, **the last item would be at index `-1`**. But what happens if you want to use an index which is higher than the index of the last item? Python gently forces you to correct this mistake and raises an `IndexError`. 

**Python-Syntax:** `list_name[index]`

In [None]:
print(list_of_arbitrary_items[0])    # access first list-item
print(list_of_arbitrary_items[3])    # access fourth list-item
print(list_of_arbitrary_items[-1])   # access the last list-item
print(list_of_arbitrary_items[6])    # This does not work, as the list only contains 6 items!

In some cases you don't know the values you want to store in the first place. It is common practice to define an empty list which will be filled with data at a later time. 

**Python-Syntax:** `list_name = []` 

In [None]:
numbers = []     # this list will be filled later
print(numbers)

As we already know, lists are mutable datatypes. In the next steps, we will try to overwrite (i.e. replace) an already existing item using its index and add a completely new item. The new item will be appended to the end of the already exisiting list. 

**Python-Syntax:** 
* replace an item: `list_name[index] = new_value`
* add an item to the end: `list_name.append(new_value)`
* add an item before a given index: `list_name.insert(index, new_value)`
    * add an item to the beginning: `list_name.insert(0, new_value)`


In [None]:
# replace an item:
print(list_of_arbitrary_items)     # before the replacement
list_of_arbitrary_items[2] = 9
print(list_of_arbitrary_items)     # after the replacement

# add items to the end:
list_of_arbitrary_items.append(1)
list_of_arbitrary_items.append(2)
print(list_of_arbitrary_items)

# insert items at arbitrary positions:
list_of_arbitrary_items.insert(0, "I'm new")
print(list_of_arbitrary_items)

With `append`, you can **only add one item at a time**. 

In [None]:
numbers.append(1, 2)        # This won't work!
#numbers.append([1, 2, 3])   # This works, but creates a nested list
print(numbers)

But behold! There is a way to add more items at a time with the power of mathematics. As you can see in the code below, it is possible to add all items from one list to another list.

In [None]:
numbers += [4,3,2,1] 
# numbers.extend([4,3,2,1])   # .extend() works too!
# numbers.append([4,3,2,1])   # .append() adds the list as an item not all the elements to the list! Be careful there!
print(numbers)

If you want to know, how many items are stored in you list (e.g. to avoid an `IndexError`) you can use the same function you would use for strings:

In [None]:
len(numbers)

Sometimes you have the value of an item in the list and want to find its index. For this kind of problem we can use the `index`-method. This method **returns the index of the item with the known value** (starting from the 'left'). If the list contains no item with that kind of value, python will raise a `ValueError`. One important thing to know: It only returns the index of the first occurring instance! This means, if you have equal elements, you only get the index of the first item.  

**Python-Syntax:** `list_name.index(element_value)`

In [None]:
integers = [10, 20, 30, 40, 25, 20, 100]

In [None]:
print(integers.index(25))
print(integers.index(20))    # The list contains two items with the value 20, but only the index of the first one will be returned!  
print(integers.index(420))   # This will raise an ValueError since no items with value 420 is stored in the list. 

If you want to count certain items in a list you can use `count`:

**Python-Syntax:** `list_name.count(value_to_count)`

In [None]:
print(integers.count(10))
print(integers.count(20))
print(integers.count(420))   # This will return 0 since no item with value 420 is stored in the list. 

Lists can also be sorted, which is quite convenient e.g. for ranking-tasks. This will only work if all items can be meaningfully compared to each other. So sorting a list of numbers (integers and/or floats) works, as does sorting a list of strings, but a list containing both numbers and strings will fail. Numbers are sorted ascending by value, strings are sorted alphabetically.

**Python-Syntax:** `sorted(list_name)`

In [None]:
print(integers)
sorted_integers = sorted(integers)
print(sorted_integers)

# descending sorting is possible by passing an optional argument:
sorted_integers = sorted(integers, reverse=True)
print(sorted_integers)

list_of_letters = ["p","y","t","h","o","n"]
print(list_of_letters)
sorted_list_of_letters = sorted(list_of_letters)
print(sorted_list_of_letters)

### Tuples

Tuples appear similar to lists, but have an important difference - once defined, a tuple cannot be changed! 

**Tuple-Properties:**
* stores data from any datatype (can be mixed as well)
* iterable
* **immutable**
* cannot contain mutable objects (like lists)

**Python-Syntax:** 
* `tuple_name = (item0, item1, item2, ...)`
* `tuple_name = item0, item1, item2, ...`  


In [None]:
some_words_and_numbers = ('hi', 'bye', 15)  # this way is preferred as it's more explicit
newspapers = 'Die_Presse', 'Der_Standard', 'FAZ'

#Special tuples
one_element_tuple = ('singleton',)
empty_tuple = ()
print(newspapers)
print(one_element_tuple)

Attempting to replace an item in a tuple will fail:

In [None]:
newspapers[2] = 'Die_Zeit'  #  <- Not a valid action for tuples!

However, modifying something mutable which is part of a tuple is allowed:

In [None]:
magic_square = (
    [9, 2, 4],
    [5, 7, 3],
    [1, 6, 7],
)

magic_square[2][2] = 8
print(magic_square)

### Sets

Like lists and tuples, sets are data structures to work with a collection of data. The main difference is that sets are unordered and therefore unindexed. Their most exciting property is that they *don't allow duplicates*.

**Set properties:**
* Resemble mathematical sets   
* Sets are written with curly brackets: `{element1, element2}`  
* Unordered collection with no duplicate elements  
* Elements have to be immutable but a set itself is mutable  
* Support mathematical operations, e.g. union, intersection, difference  

**Python-Syntax:** `set_name = {item0, item1, item2, ...}`

There are two different ways to declare a set. One is with the function `set()`. Like in the example below, you can create a set from another iterable (like a list or set). You can see that it is a set now because of the curly brackets.

In [None]:
a = [1, 2, 3, 4, 4, 5] # list
b = (1, 3, 4, 5, 6, 6, 6) # tuple
c = set(a)
d = set(b)
print(c)
print(d)

But you can also declare a set implicitly with curly brackets (`{}`) like in the example below.

In [None]:
backpack = {"notebook", "phone", "key", "gum", "pen"}

print(backpack)

**Be cautious when you want to create an empty set - you can't create it with `empty_set = {}`!**  
This would be another data structure named dict, which we'll get to soon.

In [None]:
empty_set = set()
print(empty_set,  type(empty_set))

not_an_empty_set = {} # this would be an empty dict!
print(not_an_empty_set, type(not_an_empty_set))

#### Add Elements `.add()`

But what do you need to do when you want to fill the set? `empty_set += {"green"}` won't work. Use the method `empty_set.add("green")` instead.

In [None]:
backpack = {"notebook", "phone", "key", "gum", "wallet"}
print(backpack)

In [None]:
# backpack += {"phone charger"} # This does not work...
backpack.add("phone charger")
print(backpack)

#### Remove Elements `.remove()`

One element should not be in the set? No problem - you can easily remove it with `.remove()`.

In [None]:
backpack = {"notebook", "phone", "key", "gum", "wallet"}
print(backpack)

In [None]:
backpack.remove("gum")
print(backpack)

Try to execute the cell above again! A `KeyError` is raised because `"gum"` now isn't part of the set anymore.

#### Mixed Elements:
You can mix the data types in a set as you like.

In [None]:
{1, 2, 3, 3, 5, 6, 4, 3, 3, 3.1, "A string", False}

Note that equally valued integers and floats are considered identical in a set. The element occuring first is retained:

In [None]:
{1, 1.0}

In [None]:
{1.0, 1}

What happens, if you have a list in a set for example?

In [None]:
{1, 2, 3, [1,2,3]}

#### Unhashable?
So apparently sets can not contain "unhashable" things... but what does "hashable" even mean?

A hash function is any function that can be used to map data of arbitrary size to fixed-size values. This makes it easy to compare arbitrary elements to each other. However, Python requires an object's hash to be constant over its lifetime. For something mutable (like a list) this can't be guaranteed, so only immutable types are hashable.

More information on hashing?
- https://en.wikipedia.org/wiki/Hash_function
- https://runestone.academy/runestone/books/published/pythonds/SortSearch/Hashing.html
- https://docs.python.org/3/glossary.html#hashable

More information about mutability?
- https://towardsdatascience.com/immutable-vs-mutable-data-types-in-python-e8a9a6fcfbdc
- https://docs.python.org/3/glossary.html

#### One string-element in a set?

Be careful with sets with a string as its only element, because funny things can happen:

In [None]:
one_element_set = set('singleton')
print(one_element_set)

In [None]:
one_element_set = {'singleton'}
print(one_element_set)

In [None]:
one_element_set = set(['singleton'])
print(one_element_set)

In [None]:
one_element_set = {['singleton']}
print(one_element_set)

#### Set operations: `&`, `|`, `^`, `-`

Sets have very handy operations to make intersections, unions, or differeces

In [None]:
backpack = {'notebook', 'phone', 'key', 'gum', 'wallet'}
other_bag = {'key', 'sandwich', 'bottle', 'phone'}

print("Intersection (AND): ", backpack & other_bag)          # Items in both bags
print("Union (OR): ", backpack | other_bag)                  # Combine both bags
print("Symmetric Difference (XOR): ", backpack ^ other_bag)  # Items in only one of the bags
print("Difference: ", backpack - other_bag)                  # Items in the first, but not the second bag
print("Difference: ", other_bag - backpack)                  # Items in the second, but not the first bag

Combining these operators with the assignment operator (`=`) gives a way to add or remove one or more elements to or from a set:

In [None]:
backpack = {"mobile", "pen", "paper"}

backpack |= {"pencil"}
print(backpack)
backpack -= {"pen", "mobile"}
print(backpack)

## `for`-Loops
`for`-loops iterate over sequences (like lists, tuples, or sets). Often in programming, we want to perform the same action using different input data. Instead of writing this down X times we define a loop which does this for us.

The `for`-loop in python is defined as follows:
```python
for variable in sequence:
    # do stuff
```

So we start with the keyword `for` followed by a variable name. This variable is created here and contains one element of the sequence, which changes every iteration of the loop until the whole sequence is finished, or we as programmers stop the loop. Next comes the `in` keyword followed by the sequence. Finally, the part which should happen at each iteration is _indented_ below the `for`-statement (i.e. the line starts 4 spaces further to the right).

Let's have a look at a for loop, looping over this new amazing datatype we just learned, a `list`:

In [None]:
data = [0, 1, 2, 3, 4, 5, 6]

for element in data:
    print(f"Element in list: {element}")

`for`-loops can iterate over arbitrary sequences:

In [None]:
# iterate over a string
hello_world = "Hello World!"
for letter in hello_world:
    print(f"Letter in tuple: {letter}")

In [None]:
# iterate over a tuple   
# note that the iterable does not need to be stored in a variable, but can
# also be written after `in` as a 'literal':
for element in (0, 1, 2, 3, 4, 5, 6):
    print(f"Element in tuple: {element}")

## Dictionaries
- [Video tutorial (10 min)](https://www.youtube.com/watch?v=daefaLgNkw0)
- [Library reference](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict)

Dictionaries provide a way to map values that are associated with one another.    
This means that they are a collection of `<key>: <value>` pairs called _items_.  
A dictionary begins and ends with curly brackets: 

In [None]:
empty_dict = {} # just an empty dictionary
# empty_dict = dict()
empty_dict

Each _item_ consists of a _key_ and a _value_. Individual items are seperated with commas.   

**Keys**: Must be hashable. So they can only be strings, numbers, and tuples (tuples if they only contain hashable elements).  
**Values**: Can be hashable or unhashable, i.e. totally arbitrary. So they can be strings, numbers, lists, sets or even other dictionaries.  

In [1]:
contact_info = {'name': 'Anna', 'family_name': ['Smith', 'Williams'], 'phone': 12345678, 'city': 'Graz'}
contact_info

{'city': 'Graz',
 'family_name': ['Smith', 'Williams'],
 'name': 'Anna',
 'phone': 12345678}

What happens if we try to use a `list` as a key?

In [2]:
contact_info = {['name', 'surname']: 'Anna Smith'}

TypeError: ignored

This produces a `TypeError`! Since a `list` can be changed, its hash value can not be guaranteed not to change over its lifetime, thus it is unhashable.

If we want to look at one specific item in the dictionary we don't access it with indices or slices (as we do with lists), but with its keys:

In [3]:
key = "name"
# key = "wrong key" # this will produce a KeyError
contact_info[key]

'Anna'

Python also has a set of built-in methods that allow access to all keys, all values, or all items:

In [None]:
print("Dictionary keys:", contact_info.keys())
print("Dictionary values:", contact_info.values())
print("Dictionary Items:", contact_info.items())

A single `key: value` pair can be added to a dictionary using the following syntax:
    
```python
dictionary[key] = value
```


In [4]:
contact_info['mail'] = "something@somewhere.at"
print(contact_info)

{'name': 'Anna', 'family_name': ['Smith', 'Williams'], 'phone': 12345678, 'city': 'Graz', 'mail': 'something@somewhere.at'}


If we want to add **more than one** `key: value` pair, we can use the method  `update`:

In [5]:
additional_data = {"country": "Austria", "street": "Bahnhofstrasse"}
contact_info.update(additional_data)
print(contact_info)

{'name': 'Anna', 'family_name': ['Smith', 'Williams'], 'phone': 12345678, 'city': 'Graz', 'mail': 'something@somewhere.at', 'country': 'Austria', 'street': 'Bahnhofstrasse'}


What else can one do with dictionaries? Here are some common use cases:

In [None]:
# overwrite values using a key
contact_info['phone'] = 98765

In [6]:
# iterate over a dictionary's keys:
for key in contact_info:
    print(key)

name
family_name
phone
city
mail
country
street


In [None]:
# this does the same thing, but the above version is preferred:
for key in contact_info.keys():
    print(key)

In [7]:
# iterate over a dictionary's values:
for value in contact_info.values():
    print(value)

Anna
['Smith', 'Williams']
12345678
Graz
something@somewhere.at
Austria
Bahnhofstrasse


In [8]:
# iterate over both the keys and the values
for key, value in contact_info.items():
    print(key, value)

name Anna
family_name ['Smith', 'Williams']
phone 12345678
city Graz
mail something@somewhere.at
country Austria
street Bahnhofstrasse


In [9]:
print("name" in contact_info) # check if key is content of the dictionary
print("wrong key" in contact_info)

True
False


In [10]:
del contact_info["mail"] # deletes a specific key-value pair
print(contact_info)

{'name': 'Anna', 'family_name': ['Smith', 'Williams'], 'phone': 12345678, 'city': 'Graz', 'country': 'Austria', 'street': 'Bahnhofstrasse'}


## Combination of data structures

Are you confused with all the different data types to store data? Maybe this helps:

|    | **List**  | **Tuple**  | **Set**    | **Dict**   |
|-|-|-|-|-|
| **Declaration**|  `a_list = [1,"two", True]`      |  `a_tuple = (1,"two", True)`  |  `a_set = {1,"two", True}` | `a_dict = {1:"one", "two": 2, "three":True}`            |
| **Type casting**      | list()      | tuple()       | set()           |  dict()          |
| **Duplicates?**      | yes      | yes       | no           |  keys: no, values: yes          |
| **Ordered?**      | yes      | yes       | no           |  yes          |
| **Mutable?**      | yes      | no       | yes          |  yes          |
| **subscriptable by**      | integers or slices      |integers or slices      |not subscriptable        | keys          |
| **empty declaration**      | `empty_list = []`      | `empty_tuple = ()`        | `empty_set = set()`            |  `empty_dict = {}`           |


Of course data structures can also be combined, we can create: 
* List of lists
* Tuple of lists
* Dictionary with lista as values
* A dictionary within a dictionary
* and so on ...

In [11]:
# Simple matrix as list of lists
first_row =  [1, 0, 0]
second_row = [0, 1, 0]
third_row =  [0, 0, 1]

identity_matrix = [first_row, second_row, third_row]
print(identity_matrix)

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]


In [12]:
# Declaring a list with dictionaries as elements
list_of_grades = [
    {'name': 'Paul', 'grade': 4},
    {'name': 'Lina', 'grade': 1}
]

# Key accesses with square brackets can be chained:
print(list_of_grades)
#print(list_of_grades[0])
#print(list_of_grades[0]['name'])

[{'name': 'Paul', 'grade': 4}, {'name': 'Lina', 'grade': 1}]


In [13]:
# Declaring two nested dictionaries
first_person_info = {'name': 'Kem', 'age': 27, 'sex': 'non-binary'}
second_person_info = {'name': 'Marie', 'age': 22, 'sex': 'female'}
patients_registry = {3739: first_person_info, 7021: second_person_info}

print(patients_registry)
# print(patients_registry[7021]['age'])
# print(patients_registry[3739]['sex'])

{3739: {'name': 'Kem', 'age': 27, 'sex': 'non-binary'}, 7021: {'name': 'Marie', 'age': 22, 'sex': 'female'}}


## Common mistakes and how to fix them

### Basic syntax error

In [None]:
# this raises a SyntaxError hinting at a missing comma:
total = 0
for number in (4, 8 15, 16, 23, 42)
total += number
print(total)

In [None]:
# a for statement ends with a colon
total = 0
for number in (4, 8, 15, 16, 23, 42)
total += number
print(total)

In [None]:
# Lines after a colon (:) require indentation
total = 0
for number in (4, 8, 15, 16, 23, 42):
total += number
print(total)

In [None]:
# no error message this time, but the total should only be printed at the end:
total = 0
for number in (4, 8, 15, 16, 23, 42):
    total += number
    print(total)

In [None]:
# done :)
total = 0
for number in (4, 8, 15, 16, 23, 42):
    total += number
print(total)

### Trying to iterate over dictionary items, but forgetting something:

In [None]:
phonebook = {
    "Anna Smith": 123456789,
    "Leeroy Jenkins": 987654321,
}

for name, number in phonebook: 
    print(name, number)

To get both the key and the corresponding value on each iteration, `.items()` is required:

In [None]:
phonebook = {
    "Anna Smith": 123456789,
    "Leeroy Jenkins": 987654321,
}

for name, number in phonebook.items(): 
    print(name, number)

## Best practices


### Formatting
If you have a long literal (e.g. a list, tuple, or dictionary), put each item in a new line. This makes it more readable and easier to add or remove something:

##### _Don't_
```python
countries = ["Afghanistan", "Albania", "Algeria", "American Samoa", "Andorra", "Angola", "Anguilla", "Antarctica", "Antigua And Barbuda", "Argentina", "Armenia", "Aruba", "Australia", "Austria", "Azerbaijan"]
```

##### _Do_
```python
countries = [
    "Afghanistan", 
    "Albania", 
    "Algeria", 
    "American Samoa", 
    "Andorra", 
    "Angola", 
    "Anguilla", 
    "Antarctica", 
    "Antigua And Barbuda", 
    "Argentina", 
    "Armenia", 
    "Aruba", 
    "Australia", 
    "Austria", 
    "Azerbaijan", 
]
```

### Adding new dictionary entries
To add a new entry to a dictionary, use item assignment instead of `.update` for less clutter and improved readability:

##### _Don't_
```python
contact_info.update({"mail": "something@somewhere.at"})
```

##### _Do_
```python
contact_info["mail"] = "something@somewhere.at"
```

### Indentation
As shown in the section about for loops, Python uses indentation to group statements.
Any number of spaces can be used, as long as it's consistent within a block.
However, most Python projects use 4 spaces per indentation level.
This convention makes it easier to read code written by someone else.

##### _Don't_
```python
for country in countries:
 print(country)
```

##### _Do_
```python
for country in countries:
    print(country)
```

### Descriptive variable names
While writing code may seem difficult for now, reading (and understanding) code is actually a far greater challenge. To make things easier for others (including our future selfs), use descriptive variable names:

##### _Don't_
```python
for x in countries:
    print(x)
```

##### _Do_
```python
for country in countries:
    print(country)
```