<center><img src=img/MScAI_brand.png width=80%></center>

# Compound data types


We've seen the basic data types -- integers, floating-point numbers, strings, Booleans. There are also *compound* data types, sometimes called *containers*.

The most basic is the `list`. A list is an ordered collection of values. You can create a list by enclosing values in square brackets, separated by commas.

In [1]:
primes = [2, 3, 5, 7, 11]
names = ["John", "Paul", "George", "Ringo"]
import math
sines = [math.sin(0.1), math.sin(0.2), math.sin(0.3)] # not just constants
examples = [primes, names, sines] # list of lists also ok
mixed = ["a", "b", 3, sines] # no restriction on types

You can add items to a list, delete items, find items, sort the items, etc.

In [2]:
names.append("Fred")
print(names)
names.remove("Fred")
print(names)
print(names.index("Paul"))
names.sort()
print(names)
del names[2]
print(names)

['John', 'Paul', 'George', 'Ringo', 'Fred']
['John', 'Paul', 'George', 'Ringo']
1
['George', 'John', 'Paul', 'Ringo']
['George', 'John', 'Ringo']


In [4]:
names = ["John", "Paul", "George", "Ringo"] # reset to original order

We can join two lists together, and we can calculate the length of a list with `len()`.

In [None]:
x = [3, 4, 5] + [9, 10, 11]
print(x)
print(len(x))

Iteration with `for`
---

The `for` keyword introduces a different type of loop. It's always `for x in L`. 

`x` becomes the name of a *new* variable with a different value at each iteration. `L` is a list (or something which, like a list, is a sequence of multiple values).

In [5]:
for x in names:
    print(x) 

John
Paul
George
Ringo


`range` is a useful function for use with `for`:

In [6]:
for x in range(5):
    print(x)

0
1
2
3
4


**Exercise** (Think Python 10.3): *Write a function that takes a list of numbers and returns the cumulative sum; that is, a new list where the ith element is the sum of the first i + 1 elements from the original list. For example, the cumulative sum of* `[1, 2, 3]` *is* `[1, 3, 6]`.

In [None]:
def cumulative_sum(L):
    s = 0
    result = []
    for x in L:
        s += x
        result.append(s)
    return result

cumulative_sum([1, 2, 3])

Notice a common pattern here: we created an *empty list* `[]`, then gradually `append`ed to it. 

**Exercise**: what would happen if we tried to iterate over the empty list? Guess, then try it.
```python
L = []
for i in L:
    print("Hello")
```

**Exercise** (Think Python Section 10.13 Part 2). Suppose `L` is a list, and you want to append an item `x`.

In [None]:
L = [5, 6, 7]
x = 9

There are two ways to do it:

In [None]:
L.append(x) # preferred

In [None]:
L = L + [x] # less preferred

And four ways not to:
```python
L.append([x]) # WRONG!
L = L.append(x) # WRONG!
L + [x] # WRONG!
L = L + x # WRONG!
```

Notice that in some cases you will see an error (a crash), and in others there will be no crash, but it won't do what you wanted/expected.

Useful things to use with `for`
---

`zip` takes two lists and makes a list of pairs.

We can *destructure* the pair using `for x, y in zip(...)` instead of `for x in zip(...)`:

In [7]:
for x, y in zip(names, ["guitar", "bass", "guitar", "drums"]):
    print(x, y)

John guitar
Paul bass
George guitar
Ringo drums


`enumerate` takes a single list and returns a list of index-item pairs. Again, we can destructure each pair.

In [10]:
names

['John', 'Paul', 'George', 'Ringo']

In [12]:

for i, x in enumerate(names):
    if i % 2:
        print(i, x)

1 Paul
3 Ringo


**Exercise**: Given the following lists, combine them using `enumerate` and a **three-way** `zip` to print this:
```
1: John Lennon, guitar
2: Paul McCartney, bass
3: George Harrison, guitar
4: Ringo Starr, drums
```

In [None]:
n1 = ["John", "Paul", "George", "Ringo"]
n2 = ["Lennon", "McCartney", "Harrison", "Starr"]
inst = ["guitar", "bass", "guitar", "drums"]

In [None]:
# destructuring
for i, (a, b, inst) in enumerate(zip(n1, n2, inst)):
    print("%d: %s %s, %s" % (i, a, b, inst))

Tuples
---

A tuple is like a list, but enclosed in round brackets, and it is *immutable*: you can't change it!

In [16]:
t = (4, 5, 6)
t = list(t)
t[1] = 17
t = tuple(t)
print(t)

(4, 17, 6)


Usually we use tuples to mimic rows in databases -- each "field" has potentially different semantics and/or a different type -- whereas we use lists where all the data is of the same type -- more like a column in a database.

Indexing and slicing
---

Strings, lists, and tuples all have some very useful operations in common. You can get the $n$th element by putting $n$ in square brackets:

In [18]:
L = [5, 6, 7]
L[1]
[10, 12, 12][2]

12

The first element is element 0:

For convenience, we can count backwards in indices. The last element can be indexed as -1:

In [19]:
L[-1]

7

In [20]:
L[-2]

6

In [21]:
L[1] == L[-2]

True

We can use list indices on the left hand side of an assignment as well:

In [22]:
L[1] = 12
print(L)

[5, 12, 7]


### Slices

A *slice* is a sublist (or substring, or sub-tuple). We get a *slice* using the colon `:`, with two indices indicating the `start` and `end` of the sublist, e.g. `L[1:3]`. 

![Slicing in Python](img/python-slicing.png)

Credit: http://infohost.nmt.edu/tcc/help/lang/python/docs.html

We can omit the `start` index, and it defaults to 0, or the `end` index, and it defaults to `len(L)`, or even both!

With slices, it helps to think of an index as pointing "just before" the corresponding position, as shown.

**Exercises** 
* What if you try an index too large for the list?
* Think Python exercise 6.6: write a recursive function `is_palindrome` which works by checking whether the first and last elements of a string are equal, and if so, calls itself on the remainder.

`in`
---

The `in` operator checks whether some compound data structure contains a given object. It also works on strings.

In [23]:
5 in L

True

In [24]:
s = "xylophone"
if "ph" in s:
    print("I found ph")

I found ph


Sets
---

A set is an unordered collection of non-duplicate items. (Contrast to a list, which is ordered and can contain duplicates.)

In [25]:
s = {6, 5, 10}
s.add(6)
print(s)
print(5 in s) # "in" works

{10, 5, 6}
True


Dictionaries
---

A dictionary is like a set of keys each paired with a value. It's really useful! 

In [26]:
d = {"name": "Bob"}
d["name"] = "Fred"
d["age"] = 37
print(d)
print(d["age"])
print("name" in d) # "in" looks in the *keys*
print("Fred" in d) # it does not look in the *values*
print(d["job"])    # the key "job" doesn't exist, so we'll see a KeyError

{'name': 'Fred', 'age': 37}
37
True
False


KeyError: 'job'

We should notice a type of regularity/coherence in Python syntax. Lists use square brackets; sets use curly brackets; tuples use round brackets. In all cases the items are separated by commas. A dictionary is a *set* of key-value pairs, so it uses curly brackets like a set -- but now the items have to be pairs, each pair having a colon `:` in the middle.

**Exercises**

* Can we use indexing and slicing with strings? With sets? With dictionaries? Why/why not? Try them and observe the result.
* Can we use `len` with lists? With sets? With dictionaries? Why/why not?
* What happens if we say `for x in d`, a `dict`? Try it.

**Exercise**: Consider this complex nested data structure. Describe its structure: "it is a `list` of ... of ... of ... of dicts where each key is a ... and each value is a ..."

In [None]:
students = [
    {
        "name": "Bruce Wayne", 
         "age": 34,
         "ID": "1234",
         "modules": {
                "CT5123": {
                    "grades": [55, 68],
                    "attendance": [False, True, True, True, True],
                },
                "CT5234": {
                    "grades": [45, 90],
                    "attendance": [True, False, False, False, True]
                }
        }
    },
    {
        "name": "Peter Parker",
        "age": 21,
        "ID": "0126",
        "modules": {
                "CT5123": {
                    "grades": [90, 90, 90],
                    "attendance": [False, True, True, True, True],
                },
                "CT5234": {
                    "grades": [60, 74],
                    "attendance": [False, True, True, True, True]
                }
        }
    }
]

Observe that we can access any element using an expression like `students[0]["name"]`, which gets the first student's name. Write a similar expression that picks out Parker's attendance record in CT5123, and then another to get Wayne's most recent grade in CT5234.

**Exercise**: Why did I write "ID" as a string, not an integer?