# Python Programming

In this course, you will learn basic programming skills in Python.

## Part 4: Collection data types

This part introduces collection data types such as `list`, `tuple`, `dict`, `set`. These data types are fundamental to solving many programming problems and to organizing data together with different structures and properties.

### List

A `list` is a ordered collection of elements. The elements data types can vary. A list is created using **brackets**.

In [None]:
list1 = [1.0, "name", 15, True]

print("Type:  ", type(list1))
print("Values:", list1)
print("The list contents look like %s. Some more text" % (str(list1)))

#### `list` - Indexing

The elements of a list are accessed by typing the name of the variable, followed by the index in brackets. Note that **the first element has the index 0** which is usual for most programming languages but not MATLAB. Python supports negative indices, where -1 will give the last element of the list, -2 the second last and so on.

In [None]:
#     0    1    2    3    4    5    <- indices
L = ["a", "b", "c", "d", "e", "f"]

# access first element of L
print("first: ", L[0])
# access second element of L
print("second:", L[1])
# access last element of L (in this case equiv. to L[5])
print("last:  ", L[-1])

Trying to access any element after the last one will raise an `IndexError`. 
For example our created list has currently 6 elements, so the last element has the index 5. Trying to access the element at index 6 will fail:

In [None]:
# print(L[6])       # this would rise an IndexError

Lists can be used together with for loops in the obvious way:

In [None]:
L = ["a", "b", "c", "d", "e", "f"]
for character in L:
    print("The character is %s" % (character))

#### `list` - Slicing

Python supports slicing, similar to MATLAB. Slicing means accessing a part of a list. The syntax is

`List[start:end:step]`

where `start` is the start of the slice (included), `end` the end of the slice (excluded) and `step` the step width in between. When `start` is omitted, the slice will start from the first element. Omitting `end` will make the slice end with the last element. Omitting `step` will result in a step width of 1.

In [None]:
#     0    1    2    3    4    5    <- indices
L = ["a", "b", "c", "d", "e", "f"]

# access elements from index 1 to 3
print(L[1:3])
# access all elements from 1 up to 5 with step width 2
print(L[1:5:2])

In [None]:
# access all elements up to index 3
print(L[:3])
# access all elements from index 3
print(L[3:])

In [None]:
# access all elements (creates a copy of L)
print(L[:])
# access all elements with reversed order
print(L[::-1])

Indexing and slicing can be applied to strings.

In [None]:
s = "xyz" * 5
print(s)
print(s[:5])
print(s[-1])

#### `list` - Editing

Editing means changing values of a list without changing its shape.

In [None]:
list2 = [1, 1, 1, 1]
print(list2)

# assign a new value to the first element
list2[0] = 0
print(list2)

# slices can be edited as well
list2[2:4] = [2, 3]
print(list2)

#### `list` - Manipulation

A list can be manipulated using `append`, `extend`, `del` or `remove`.

In [None]:
# create an empty list
list2 = []
print(list2)

# append one element
list2.append(1)
print(list2)

# append a sequence
list2.extend([2, 3, 4])
print(list2)

# insert 0 at index 2
list2.insert(2, 0)
print(list2)

# remove the element at index 2
del list2[2]
print(list2)

# remove the first element that is equal to the argument value
# raises an error if the value is not in the list!
list2.remove(1)
print(list2)

#### `list` - Comprehension

The following example demonstrates a powerful feature in Python which uses the for-in syntax called list comprehension. More information available [here](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions).

In [None]:
# 'for' inside brackets used to create a list
lst_sq = [x**2 for x in range(1, 11)]
print(lst_sq)

In [None]:
# A not too nice example of a nested list comprehension, beware, it is difficult to read

L = range(100)

[z for z in [y for y in [x for x in L if x > 1] if y % 2 == 1] if z % 5 == 0]
# solve unreadability always with a comment
# the most inner list exrpession computes numbers > 2
# refined to numbers non-even
# refined to numbers divisible by 5

# Better: Unnest list comprehensions for better readability
Q = [x for x in L if x > 1]  # all greater 1
Q = [y for y in Q if y % 2 == 1]  # all non-even
Q = [z for z in Q if z % 5 == 0]  # all divisble by 5
print(Q)

#### `list` - Enumeration

The `enumerate()` function is very helpful when iterating over sequences and keeping track of the index.

In [None]:
L = ["a", "b", "c"]
for index, elem in enumerate(L):
    print("Element {}: {}".format(index, elem))

In Python, you can use a single line to assign values to multiple variables using a list on the right-hand side of the assignment statement. This is often referred to as *multiple assignment* or *unpacking*. This syntax allows you to assign values to multiple variables in a concise and readable manner. Keep in mind that the number of variables on the left side must match the length of the iterable (list or tuple) on the right side. If there is a mismatch, a `ValueError` will be raised.

In [None]:
a, b, c = [3, 5, 8]
print(a)
print(b)
print(c)

#### `list` - Looping over two lists

When it is necessary to iterate over two lists simultaneously one can use the built-in `zip()` function. It takes multiple lists as arguments and creates a list-like object of touples, holding the corresponding elements of all lists. This is demonstrated in the next cell:

In [None]:
L1 = ["a", "b", "c"]
L2 = [1, 2, 3]
L3 = ["I", "II", "III"]

for x, y, z in zip(L1, L2, L3):
    print(x, y, z)

#### `list` - Others

The following cells should give an overview about the usage and capabilities of lists. In general it is better to learn what is possible first and then focus on the details (because the details can be quikly looked up).

In [None]:
# repeats the given list 4 times
L = [1, 2] * 4
print(L)

In [None]:
# the + operator can be used to concatenate lists
L = [1, 2] + [3, 4] + [5, 6]
print(L)

In [None]:
L = [1, 2, 3]
# reverse the elements in the list
L.reverse()
print(L)

In [None]:
L = [1, 2, 1, 3]
# count occurences of a value
n = L.count(1)
print(n)

In [None]:
L = ["a", "b", "c"]
# x in L returns true if x matches at least one element in L
b = "a" in L
print(b)

In [None]:
L = [5, 8, 2, 6, 4, 2.5, 1]
# sort list (only possible if values are comparable)
L.sort()
print(L)

In [None]:
L = [5, 8, 2, 6, 4, 2.5, 1]
# len(L) will return the number of elements in L
print(" length =", len(L))
# min(L) will return the minimum value in L
print("minimum =", min(L))
# max(L) will return maximum value in L
print("maximum =", max(L))

##### Excurse: Tricky list augmentation
Although the following four implementations produce the same result, the first one is about 1000 times slower than the others. This is becouse of Python managing memory resources differently depending on the implementation. As a high level programming language, Python hides such low level details like memory allocation from the user to provide a simple interface. Nevertheless, when performance matters, one prefers to have more control and transparency over such low level deltails. System programming languages like C/C++ give you such control as well as vast increase in performance among others benefits and features. Tradeoffs are a cognitive overhead to deal with the increased complexity induced by more features, control and freedom.

In [None]:
L = [1, 2]

L = L + [42]  # impl 1   VERY SLOW !!
L += [42]  # impl 2
L.append(42)  # impl 3
L.extend([42])  # impl 4

print(L)

More information about lists available [here](https://docs.python.org/3/tutorial/datastructures.html%3E).

### Tuple

Tuples are very similar to lists with the essential difference, that they are **not mutable**. A tuple is created with the values in parentheses:

In [None]:
# creation of a tuple
T = (False, 1.0, "2", 3)
print("Type:  ", type(T))
print("Values:", T)

Once a tuple is initialized, it is neither valid to change its values nor to add/remove elements. E.g. trying to assign a new value to the first element in `T` will raise a `TypeError`:

In [None]:
# accessing elements is valid
print(T[0])
# T[0] = 1.0  # will raise a TypeError

Any list functionality does not modify the entries can also be applied to tuples. E.g.

- indexing / slicing
- the functions `max()` / `min()` / `len()`
- the `in` operator.

Note: In many cases the parentheses are not required to create a tuple, but it is best practice to **use perenthesis**. Nevertheless the following cell is valid.

In [None]:
# no parenthesis used to create the tuple
T = False, 1.0, "2", 3
print(T, type(T))
print("Values:", T)

More information about tuples is available [here](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences).

### Dictionary

A dictionary is basically a **list of key-value pairs** where every key is unique and points to exactly one value. Like with lists and tuples the data type of both the keys and the values can vary. A dictionary is constructed using **braces**. Inside the braces the keys and values are separated by a colon, the key-value pairs are separated by commas:

In [None]:
# a dictionary with 3 key-value pairs
D = {"a": 1, "b": 2.0, 15: "x"}
print("Type:  ", type(D))
print("Values:", D)

The value for a specific key in a dictionary `D` is accessed using the syntax `D[k]`, where `k` is the key. An error will be raised if the key is not in the dictionary.

In [None]:
D = {"a": 1, "b": 2}

# access the value that corresponds to the key 'b'
value = D["b"]
print("Value by index lookup:", value)

# accessing the value by indexing if the key does not exists will throw and error
# value = D["z"]

# accessing the value with `get` will insted return `None`
maybe_value = D.get("z")
print("Value of missing key with `get`:", maybe_value)

# alternatively a default value can be specified for `get`
value_or_default = D.get("z", 99)
print("Default value of missing key with `get`:", value_or_default)

Modifying values of existing keys and adding new key-value pairs works in the same way:

In [None]:
D = {"a": 1, "b": 2}

# add a new key 'c' with the value 3
D["c"] = 3
print(D)

# change the value of key 'c' to 5
D["c"] = 5
print(D)

The `items()` method of dictionaries can be used to iterate over each key value pair. Alternatively it can be used to convert a dictionary to a list of tuples.

In [None]:
print(D)
D_as_list = list(D.items())
print(D_as_list)

We can also use `dict.keys()` method to traverse a `dict` variable.

In [None]:
D = {"a": 1, "b": 2.0, 15: "x"}  # a dictionary with 3 key-value pairs
for key in D.keys():
    print(key, D[key])

Alternatively, you can use `dict.items()` method to access each key-value pair within a dictionary.

In [None]:
# Iterate over pairs (x,y in )
for key, value in D.items():
    print(key, value)

More information about dictionaries available [here](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).

### Set

A set contains **unique** elements in an **unordered** fashion (varying data types allowed). Unique means, that every element will not occur more than one time in the set. Adding a element to a set, that already contains such a element will not modify the set. Unordered means, that the elements in a set don’t have indices and though indexing or slicing is not possible. Sets are created similar to dictionarys but without colons:

In [None]:
# the element 1 will only be once in S, because sets do not allow dublicate entries
S = {1, 2, 3, 1}
print("Type:  ", type(S))
print("Values:", S)

Albeit sets don’t support indexing, they allow some useful operations like computing the intersection, the union or the disjoints:

In [None]:
S1 = {"a", "b", "d", 1, 2, 4}

# alternative constructor (same as casting a tuple to a set)
S2 = set((1, 3, 4, "a", "c"))

print("Elements in S1:               ", S1)
print("Elements in S2:               ", S2)

S_intersect = S1 & S2  # = S1.intersection(S2)
print("Elements in S1 and S2:        ", S_intersect)

S_union = S1 | S2  # = S1.union(S2)
print("Elements in S1 or S2:         ", S_union)

S_xor = S1 ^ S2  # = S1.difference(S2).union(S2.difference(S1)))
print("El. in S1 or S2, not in both: ", S_xor)

More information about sets is available [here](https://docs.python.org/3/tutorial/datastructures.html#sets).

### Tasks

There is one task for each kind of data type we learnt in this part.

#### Task 1: List - intersection without duplicates

For two lists `a` and `b`, print their intersection without duplicate elements.

In [None]:
a = [2, 5, 1, 5, 1, 7, 4, 2]
b = [4, 6, 2, 1, 1, 1, 6, 7]
# expected output: 2 1 7 4
# your code here

#### Task 2: Tuple - tuple as dictionary keys

Create a dictionary where the keys are tuples representing (name, age) pairs, and the values are corresponding addresses.

In [None]:
names = ["Alice", "Bob", "Charlie"]
ages = [25, 30, 22]
addresses = ["123 Main Str.", "456 Oak Ave.", "789 Pine Rd."]
# expected output:
# ('Alice', 25) 123 Main St
# ('Bob', 30) 456 Oak Ave
# ('Charlie', 22) 789 Pine Ln

# your code here

#### Task 3: Dict - word frequency counter

Given a list of words, create a dictionary that maps each word to its frequency in the list.

In [None]:
sample_text = "This is a sample text. The text is a collection of words. Words make up the sentences."
# Hint1: use str.strip('punctuation') method to remove leading and trailing punctuation.
# Hint2: use str.lower() to transform all letters into lower cases.
# expected output:
# this: 1
# is: 2
# a: 2
# sample: 1
# text: 2
# the: 2
# collection: 1
# of: 1
# words: 2
# make: 1
# up: 1
# sentences: 1

# your code here

#### Task 4 - Set: re-do Task 1 using `set` methods

Use `set` methods to do Task 1 again.

In [None]:
# Hint: s1.intersection(S2) or s1 & s2
a = [2, 5, 1, 5, 1, 7, 4, 2]
b = [4, 6, 2, 1, 1, 1, 6, 7]
# expected output: 2 1 7 4
# your code here